Post-Conference Notes

The conference was really inspiring, bringing lots of good ideas. Unfortunately I could not absorb everything – which I guess is normal in this kind of thing – but it cleared a lot the ideas from the previous point, and it let me “wandering” about some concepts, which could be the “staring point” for more research (and probably a few posts more!).

The antagonism between consistency and time was strongly enforced during the conference, which translates in the different models: transactional vs eventual consistency. The transactional is the only one implemented on  the “traditional” relational databases and eventual consistency is implemented in (some) NoSQL databases. Basically NoSQL gives you the freedom of not having to be “eventual consistent”, but of choosing to do so. And “why would you do such a thing?”

There is a theorem called CAP, which basically states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:

    – Consistency (all nodes see the same data at the same time)
    -Availability (a guarantee that every request receives a response about whether it was successful or failed)
    – Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)

So if you go for a NoSQL solution, you basically may decide which two of these are important for you.

As we discussed in the previous post, eventual consistency translates in that “given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent” (if you are interested in some metrics, on “how long” may be this period of time you may wanna have a look at Probabilistically Bounded Staleness).

To put it all really simple; NoSQL’paradigm is:

“It is better to ask for forgiveness than for permission.”

And the “traditional” model, instead is:

“Better safe than sorry.”

Think nevertheless, that inside this “safeness” are “hiding” longer response times or intolerance to fault.

Apart from this important/major/core concept, there were some things that “caught my eye” in this conference. Namely:

– Cassandra database: which is the Apache, Open-source, eventually‐consistent key‐value store.

-MongoDB: that seems to have a lot of success stories, and even has some Geospatial indexing.

– The Apache Hadoop open-source software framework that supports data-intensive distributed applications like Facebook; and its “improved” version Storm, that is now used and owned by Twitter (although I could not understand very well, its complicated technical achievements!).

– Basho company and their highly distributed Open-source database (Riak).

– Dynamo, a proprietary, highly available key-value structured storage system that was developed  by Amazon (and implemented by Cassandra).

-The Singapore live project, which is an “almost” real-time data visualization project.

– A very interesting project, at the Barcelona Digital Technological Centre, looking at the mobility network in the city.

OMG! And I discovered that Twitter is storing the geolocations of people (including) me, by default, and providing them to thrid-party applications through their API. You can read more here.

So the immediate consequences of this conference in my life (apart from the fact that I drank lots of coffee! :-)) were:

– I installed MongoDB in my laptop;

– I removed the location stamp on my Twitts, by changing my Twitter settings.

– I got to know the BDigital 😉