Hadapt, a 30-employee startup that grew out of Yale University’s computer lab with $9.5M from Bessimer Venture Partners and NorWest Venture Partners, has figured out a way to use SQL on Hadoop databases. This neat technical trick means that business users, who understand the business questions they want to ask the data, can access Big Data databases directly through the tools they are used to – Tableau and Microstrategy, for instance.
This raises the question, can, and should, Hadoop replace RDBMS data warehouses for the business research and decision support side of the data center? Wikibon Chief Analyst David Vellante caught up with Justin Borgman, Hadapt CEO & co-founder, at Atlas Venture in June and asked that question.
Many people are approaching the problem of combining structured and unstructured data by building connectors between their RDBMS data warehouses and new Hadoop or other Big Data databases, Borgman admits. “We think that’s inefficient first because you’re now paying for two different systems, but also there’s a latency involved in moving multiple Tbytes around. When dealing with Big Data, you want to avoid moving the data as much as possible, so by bringing all that into one platform you have all the data available in one place. Also, often when you are moving that data though the connector, you are summarizing or aggregating the data in some way. But if you bring that processing to the data itself, then you have the entire raw data there as well.”
Of course it is not that easy. Introducing Hadoop into the data center requires developing new data management skills using entirely new, and today often very immature, technologies. Today’s approach to Hadoop data management basically is to load and process a huge amount of data to research a one-off question such as “Why do we experience a huge churn in our subscriber base, and how can we make the customer experience better?”, a question that is very important to the cellular carriers. Once the question is answered and the customer-facing processes, for instance, are redesigned to eliminate the frustrations customers experience, that data is dumped and new data loaded to research another question, or the same question for a different cellular carrier. This obviously will not work for core corporate data such as financials that companies often have to preserve for multiple years for tax, compliance, and internal financial analysis, implying that companies would need to keep that RDBMS data warehouse or archive in some form to preserve that data long term.
Wikibon Big Data Analyst Jeff Kelly, discussing this question with Vellante later that day, said, “As far as what Hadapt’s doing, I think it’s a real interesting approach…. Ultimately that’s a simpler and more elegant solution than cobbling together traditional relational databases into the Hadoop environment. … It’s a bit of a bet – I’d say a long shot….but that’s how you win big.”
How Hadapt arrived in Cambridge, Massachusetts, is a story in itself. “Unfortunately, we couldn’t stay in New Haven [Connecticut, the location of Yale University] and build this kind of company…. It came down to the kind of talent we’re looking for, which is specifically systems development and database developers. Those really only exist in two places – Silicon Valley and Boston.” They chose Boston because several similar companies are also located there, creating a pool of data engineers that Hadapt hopes to tap into.