Welcome to the “Obscure” World of Databases

How would we picture the image of a non doctor, entering an operations room and performing a surgery? Probably pretty badly. After removing the scope of the consequences, to me this is very similar to the image of a non database person, designing a database.

For several reasons, people who have not been formally trained as computer scientists find themselves performing IT tasks, some of them very sophisticated. I don’t see anything wrong with people that were not formally trained embrassing an IT career (actually I am one of them), as long as… they do embrace it. That is: study or research the things they don’t know, and do care about standards of quality. If they don’t do it, then they are just bringing into “shame” the name of everyone who is working on the field, by bringing the level really down.

All these thoughts – that are actually recurrent in my life – were triggered by analysing what I thought it was a relational database. By going through it and refactoring it, I got a pretty good collection of the things that you should not do, when designing a database. These are some ideas that I would like to clarify; if you think that they are obvious, you would be surprised with what I saw.

Choosing a relational database management system (RDBMS), does not automatically mean that you have a relational database. A relational database is a database that is organized in terms of the relational model, as formulated by Edgar F. Codd. According to Wikipedia[1], the purpose of this model is to “provide a declarative method for specifying data and queries: users directly state what information the database contains and what information they want from it, and let the database management system software take care of describing data structures for storing the data and retrieval procedures for answering queries”. It is a requirement of the model that the users state exactly what the database contains, and describe properly how it is organized; only in this way, it is possible to query the database and obtain exact answers to our queries.

If the database is a repository of information, some of it unkown or unnecessary, and not really organized, that means that we are not modelling the data, but only storing information. The outcome of this, is that we cannot query the database and produce any usefull answers about our data (there are no miracles!). We can store the information, but not in a much better way than if we were storing it in a filesystem. Users of this repository may implement themselves ways of dealing with this data by creating relations on-the-fly, either by using the SQL engine, or by pullling the data out and do it somewhere else. However they are not gaining anything from the data model, as they have to figure out themselves how the data is organized, and there is no guarantee that any two people would not do it in a different way.

My first bullet point is that there is absolutely no point in using a relational database engine and *not* implement a relational model; it is probably a worst solution than using a filesystem, because it may pass the wrong idea to people: that there is a relational model.

If people are not implementing the relational model because they do not know the data, or because the data is not organized in a relational manner, this is an explanation I can accept. In fact I think for many cases (and some of them which are being approached relationally) the rigid structures of a top-down design are not the providing a good solution, because knowing everything (or at least a lot) about the data is a weak assumption. For this cases there is NoSQL[2], and specially the document-driven databases provide a very flexible model, close to what people implement with filesystems. In other words: this is ok, but then don’t use a relational database engine.

On the other hand, if people are not implementing the relational model because they don’t know it, then they should learn about it. They should learn about it, at least to be able to decide that they don’t use it and go for a NoSQL database. Finally if they do not want to learn about databases, or relational models, because they are “just” biologists or economists, that is also fine but then please don’t let them design a database, specially if it is one where you want to store valuable data. And I am afraid of the institutions who let this sort of thing happen.

Image

 

[1] http://en.wikipedia.org/wiki/Relational_database

[2] http://en.wikipedia.org/wiki/NoSQL

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s