The talk about big data can be deafening. But what is it really? I was looking at a post by Wikibon Co-Founder David Floyer who did a comprehensive overview on the topic earlier this summer and thought it might be worth looking at definitions from various experts and executives to provide some perspective.
Big data has the following characteristics:
- Very large distributed aggregations of loosely structured data – often incomplete and inaccessible:
- Petabytes/exabytes of data,
- Millions/billions of people,
- Billions/trillions of records,
- Loosely-structured and often distributed data,
- Flat schemas with few complex interrelationships,
- Often involving time-stamped events,
- Often made up of incomplete data,
- Often including connections between data elements that must be probabilistically inferred
- Applications that involved Big-data can be:
- Transactional (e.g., Facebook, PhotoBox), or,
- Analytic (e.g., ClickFox, Merced Applications).
Tim O’Reilly says he is reminded of the PC Revolution and how it commoditized hardware. Open source commoditized software. And now its the presence of large databases over the Internet that is causing the most significant disruptions. Those large interconnected databases are what gives us the ability to check in on service such as Four Square or use an online map when driving somewhere.
Consumers now expect this kind of information. The phone will only intensify this demand, which will force us to rethink such issues as privacy and identity.
In this excerpt from theCube, O’Reilly gives an example about Apple that resonates the power of big data and how it makes the app store Apple’s true killer app.
EMC CEO Joe Tucci
EMC CEO Joe Tucci talks about big data in the context of different industries such as geoseismic data collected by oil companies or the scale of information that is aggregated in health care companies.
Brain Hopkins, Forrester Analyst
Forrester’s Brian Hopkins describes big data as “techniques and technologies that make handling data at extreme scale economical.”
He uses the “four Vs” to give his simple definition some body, which is illustrated in the chart here on the right:
The point of this graphic is that if you just have high volume or velocity, then big data may not be appropriate. As characteristics accumulate, however, big data becomes attractive by way of cost. The two main drivers are volume and velocity, while variety and variability shift the curve. In other words, extreme scale is more economical, and more economical means more people do it, leading to more solutions, etc.
Big data is a term used so loosely that it’s imperative to get multiple perspectives on what it means. For me, I align most with O’Reilly and Floyer, who get to the heart of how it applies in our world. We are entering the era in which sensors are collecting data in our physical world and delivering it to networks that aggregate and analyze the information. Big data defines us and will increasingly dictate how we live in a fully interconnected world.