In a recent survey by Oracle of 300 executives, nearly a third – 29% – of respondents graded their current Big Data management and analytic capabilities as D or F. Nearly all respondents – 93% – agreed with the statement that their organization was leaving revenue on the table by not doing a better job of leveraging Big Data.
The industries least prepared to take advantage of Big Data and analytics, the survey found, are the public sector, healthcare and utilities. These happen to also be three industries that desperately need to reinvent themselves if they hope to turn back dramatically rising costs and staggering debt.
That’s a sorry state of affairs. But it doesn’t, and shouldn’t, have to be that way.
Not only have frameworks and technologies to process, store and analyze Big Data been developed, many are available for minimal cost. In fact, the ability to leverage Big Data in a cost-effective manner is built into the value proposition of most Big Data approaches, most notably Hadoop. At its heart, Hadoop is distributed framework for processing and storing Big Data across a cluster of inexpensive, commodity machines with open source software.
While the colorfully named sub-projects that make up Hadoop – from Hive and Mahout to Sqoop and Flume – still have some growing up to do, they are in most cases mature enough for organizations across industries to begin at least experimenting with. Better yet, pre-packaged Hadoop distributions that have been tested and validated to work seamlessly together are available on the cheap and, in some cases, completely free.
Among the most mature and completely free Hadoop distributions is the Hortonworks Data Platform. If you’re not familiar with Hortonworks by now, you should be. The company was spun-out of Yahoo, one of the early pioneers of Hadoop, just over a year ago and includes a number of top committers to the Apache Hadoop project.
What sets Hortonworks apart from competitors like Cloudera and Greenplum is that the company made a strategic decision to charge absolutely nothing for its software. While other vendors offer free community editions of their respective Hadoop distributions, which lack many of the management and monitoring capabilities critical to deploying Hadoop successfully, Hortonworks has only one version of its platform. HDP is 100% Apache compatible and includes the most stable elements of Hadoop 1.0 as well as Ambari, an open source Hadoop management and monitoring software component.
Of course it’s true that most organizations will need to pay for Hadoop-related professional and technical services when they decide to move to full-scale production, but with a comprehensive Hadoop distribution available gratis, there’s no longer any excuse for organizations to not begin experimenting with Big Data.
CIOs should set the tone and encourage their brightest DBAs, systems administrators and even application developers to set asides a percentage of their time to start working with Hadoop and brainstorming ways to leverage Big Data for competitive advantage or to improve operational efficiencies. Bring the business-side into the discussion early as well to determine its most pressing problems and how Big Data might help solve them.
There are other options on the market besides Hortonworks, including the aforementioned Cloudera and Greenplum, as well as distributions from DataStax and MapR. For a complete analysis of your Hadoop distribution options, check out my report from last month on the topic over at Wikibon.org.
Putting aside the irony that this survey was conducted by Oracle, the company that I believe is most threatened by the Big Data revolution, the fact that so many executives feel unprepared to take advantage of Big Data is unfortunate and unnecessary. There are options out there for getting started with Big Data in a low-risk, high-reward manner. For the more timid among you that have yet to do so, now is the time to take that first step.