Big Data refers to data that because of its size, speed or format, that is, its volume, velocity or variety, cannot be easily store, manipulated or analyzed with traditional methods like spreadsheet, relational databases or common statistical software.

To put in another way, big data is data that doesn’t fit well into a familiar analytic paradigm. It won’t fit into a row or column of an Excel spreadsheet. It can’t be analyzed with conventional multiple regression, and it probably won’t fit on your normal computer’s hard drive anyhow.

On the other hand, one way of describing big data by looking at the 3V’s of volume, velocity, and variety.


In simplest definition, big data is data that just too big to work on your computer. What’s big for one system at one time is common place for another system at another time. Which mean the data keep increasing from time to time. For example, my iPhone¬†took photo at two or three megabytes per photo and video at about 18 megabytes per minute. And instantly, you have a very big data. There just a lots more of it.


For velocity, this is when data is coming in very fast. For some scientific research, it could take months to gather data from 100 cases. For example, there are 6,000 tweets per second and 500 million tweets per day and about 200 billion tweets per year. They are updating extremely quickly.


The third aspect of big data, variety, mean is that’s not just the rows or columns in nicely format data set in a spreadsheet, for instance. Instead you could have many different formats in a spreadsheet. You can have unstructured text, like books or blog post, news article, tweet, include photo, video, or audio. Any data format that doesn’t fit well in a spreadsheet.


