This is a series of posts that will be published, in order to elucidate the concept of Big Data.
In this first part, we will start a discussion on what is Big Data. In future posts, we’ll talk about new processing models that try to address the problem, and new technologies that are emerging to put into practice these concepts.
My posts are based on the idea of collaboration. Please all who wish to contribute to the discussion, feel free to do so, bringing more knowledge and experience for all.
Let’s start our series talking about what is, after all, Big Data.
The explosion of data
Never in the world has the production of data been so big. According to infographic produced by IBM, 100 terabytes of data are produced every day only on Facebook, 294 billion emails are sent daily and 230 billion tweets are made every day! (Source)
This huge amount of data produces a phenomenon known in the world of big data as the 5 Vs:
Volume: Huge amounts of data being produced;
Velocity: Amounts of data being produced at a very high speed;
Variety: Amounts of data being produced in different structures that nonetheless may have intrinsic relations. The content sent by e-mail a user has a close relationship with the tweets that it is (are data produced by the same user, which may refer to the same subject), but they have a completely different structure;
Veracity: In a world where large amounts of data are produced at high speed, and in different formats, it is more difficult to get data “cleaned up”, without incompleteness problems or even duplicity. The email you sent with the cake recipe of your grandmother is the same one when you published it on Facebook, just in a different formats;
Value: All these data have a high value for the business, as they bring information about the behavior, beliefs and preferences of its customers;
To resolve this issue, were developed processing models, using a technique called distributed processing. In the next post, we’ll talk more about them.
For those who have more interest in knowing about the “Vs”, this presentation is a good reference: