NoSQL or SQL ? When I use one, other or both

In my opinion the big win we had with the NoSQL was to wake us up to the fact that we were using relational databases as the only option for any problem. Of up to paraphrase a famous saying :

” Give it to a developer a relational database and the world becomes a set of tables ”

logo cretin

The interesting thing is that it was created so much hype over these databases that these different situations passed solution of the problem because they are misapplied . On the origin of the hype already talked about some posts ago : the lack of the concept of value. Perhaps even a childish feeling, the passion to be dealing with something new , which is very clear when we come across the “standard logo ” of the “movement ” : a scratched SQL acronym, one of the most silly metaphors I’ve ever seen .




In this post my goal is to expose the criteria I use in adopting a database, regardless of the fact that this is relational or not.
Quick definition: what come to NoSQL ?

Simple: any database that do not adopt the relational paradigm. Leading to extreme concept, an API that works with text files could be considered a NoSQL database. The term has two connotations , intelligent and other not so much, respectively not only SQL and SQL .

Roughly I see three broad categories: documentary , based on graphs and key / value . Are the categories with which I have some experience and that will deal with this post.

Brief description of each database category

( As the relational model is already old acquaintance of the overwhelming majority of programmers will dedicate myself here only to NoSQL categories)
Documentary

It may be the most popular category of NoSQL databases. It is characterized by the fact that we have tables with a fixed number of well-defined fields , but collections. Each collection can store one or more documents (instead of records in a table ) . Roughly a document is only an aggregate of attributes that do not have a rigid rule that definda what types of each attribute and what each type should have ( are schemaless , schemaless ) . Usually documents are saved in a format similar to JSON , as can be seen in the example below :

// A document used to represent a person
{ Name: ” John ,” surname , ” john ” nickname: ” cena ” , city: ” Los Angeles “}
// Another documenot to represent a person in the same collection
{ Name: ” Maria Cena,” surname , ” John Cena ,” pro
As can be seen in the example, I can have very different records of one another within the same collection. It sounds strange at first this kind of approach , but the relational model used inefficiently all the time in the form of scattered tables. A sparse table is one in which we see a multitude of columns of which only some are always fulfilled, and the other only rarely. To illustrate this , take a look at the following image, which is a register of equipment and materials used in a work.

In this table we can see a bad use of the relational model . Notice that there are columns whose value is very rarely completed , as can be seen in the example of the columns Lifting height and diameter . By including columns rarely used in our table , we are actually throwing away storage space and reducing system performance as a whole. Not to mention that we treat completely different objects as if they were equal, which is not always a good idea. We can see below the same data being represented in the document model :

{ Code: 1, type : ” Crusher ” power ” 300 hp” , weight : ” 2000 kg ” tag: ” BR -121 ” engines: 3 }
{ Code: 2 , type : ” Cement “}
{ Code: 3 , type : ” Breaker of matacos ” height : “4 meters,” weight ” 300 kg ” tag: ” RM- 11″ }
{ Code: 4 , type : ” forklift ” power: ” 10 hp ” weight ” 400 kg “}
{ Code: 5, type : ” conveyor belt ” altura_elevacao : “2 m “}
{ Code: 6 , type : ” Pipe ” , diameter : ” 30 cm “}

If your system ‘s sparse tables , it may be a good time for you to evaluate a documentary database. In fact, I ‘ve written some more detailed thing about this model in this blog , you can check this link .

databases that adopt this model : MongoDB , CouchDB .

Key- value

It is the simpler and perhaps bring the biggest performance gains when properly applied. In this model, the entire query to the database is made only by a key which may or may not have any value (of any type) associated. It is widely used in the implementation of system caches or access to information that changes in real time. In all models, it is what possess shorter search: is basically a search based on key hash.

In 2010 I had a very successful experience with a database that uses this approach, Memcached, you can see this link. It is a kind of widely used database also to replace messaging systems such as JMS, as it offers a simpler programming model. However, all models are also the least reliable from the point of view of persistence, as banks are usually designed to store information in memory, with little focus on the persistence of guarantee (there are exceptions, but take this as a rule).

databases that adopt this model : Berkeley DB , Memcached , Redis
Based on graph

Perhaps the least popular category , this model focuses on the relationship between objects. The metaphor in this case is the graph . We have more collections , only vertices representing the objects of our system and edges indicating the relationship between them. Typically each vertex represents an aggregate attributes very similar to the document template.

The research can be done either by some of the attributes of vertices as the relationships between them. In this type of database we can do the type research “friend of a friend”. The most obvious application is your contact list on a social network. Another interesting example would be the representation of some type structure cause / effect, as a factory, each production step generates the input to one or more steps that need to be monitored.

It is interesting to note that here we are dealing with a kind of database that allows the application of a navigational model persisted between the elements. The developer can if you want, walk in the graph to implement search algorithms in depth or the shortest path search.

Tip: write queries for relationships in these banks pdoe be a very boring process. A tool that will make your life easier is called Gremlin, which roughly is used to write more complex queries in a simple way. Then take a look at this link.

databases that adopt this model: Neo4J, VertexDB

Gain scalability

This is a controversial point for me. I see many people using NoSQL databases only for scalability. Usually this is achieved because these databases do not implement 100% ACID and therefore start to have more simple operation. This is not always true: in the case of Neo4J, for example, we have a database that implements 100%. Another argument in favor of these databases is that they consume less memory and configure a cluster is simpler. Okay: it can be, but in my opinion are arguments that do not touch the core of the matter.

There is no point I have a super lightweight database if the data structure used for storage does not support the modeling of my system. If I have a simple system like a blog, for example, that gain I would have to implement it using a key database type / value, or document based on graphs? Probably none, and such gain in performance from the database would be lost in parsing made by my application.

When opto by a NoSQL base hardly scalability is the reason. What really matters is that the persistence model is close to my system modeling and, with this, my queries are easier to be written and the mapping between what is in the database and my classes is easier and faster to be implemented and executed. Yeap: this is where the NoSQL will not add value.

With respect to real scalability gain, in my experience the only case I see it more easily to occur is with the key model / value implementing some form of caching. Second come the use of a document base when read with sparse tables. Unfortunately I have not had the opportunity to work in a huge system that could see it working (but itexto interested in this kind of work, so if you want to implement something, look in ok: D).
Difficulties with NoSQL

Not everything is beautiful in the NoSQL world. There are some problems that still can make me avoid the use of these databases. The first of these problems is the fact that the vendor lock-in is right. As there is a standard (and also does not make sense there is a) the migration of a database to another within the same category can show a very laborious task.

Another problem is the lack of high-level tools such as those found in the relational model. In most cases I find myself using only the command line to do whatever it takes. In the case of databases based on graphs the thing is beginning to improve with projects like Gephi, for example, but there is still much for us to have a satisfactory level of quality.

The learning curve can also be a problem. As each database has its own query and manipulation language, the developer will need to add more and more languages in your utility belt.
Selection criteria: when I use one or the other (including toast)

I developed a spreadsheet that currently possess 15 criteria that help me in the model choice and you can access it any time you want, simply click on this link it (this is the toast). The main criterion for compliance of my modeling with models I know (including relational). If the relational proves the fittest, no doubt is what will take, leaving aside the other. In most cases, at least the key / value is used to optimize one or another part of my systems.

It is interesting that in 100% of cases the relational model is always present. The reason for this is very simple: it is what serves the highest number of cases and even today always treated me with the exception of the situations cited in the brief description of the categories of databases that did above. Here enters a very important detail for me:

In practice NoSQL databases complement and not replace the relational model in my experience.

Dear Hypista , I’m sorry to play this cold shower in his excitement , but the truth to me so far has been this . My related tables like it or not serve well the overwhelming majority of my needs as architect / developer.

In conclusion

The NoSQL databases are an excellent addition to the relational model , avoiding that we apply in situations for which it was not designed . The choice of a non-relational model must be directly related to the compatibility of their modeling with one or other category more than mere performance gain .




LEAVE A REPLY

Please enter your comment!
Please enter your name here