The goal is to compliment Hadoop’s batch-oriented, large-scale Big Data storage and processing capabilities with lightening fast ad hoc analytic querying capabilities brought to bear by the Kognitio database. The database is a 100%, native in-memory analytics engine that is capable of storing tens of terabytes of data in RAM, significantly speeding up query times. It can be deployed on premise or in the cloud, public or private, and delivered as a service.
The partnership includes a custom connector that allows the Kognitio database to leverage Hadoop’s parallel processing capabilities to quickly identify and load subsets of Hadoop-based data into Kognitio. Roger Gaskell, Kognitio’s Chief Technology Officer, told me the company considered using open source Big Data loading and integration tools for the job but determined they couldn’t deliver the type of performance he and his team were looking for.
Instead, Kognitio tapped Hortonworks’ engineering expertise to aid in building the custom connector. The resulting technology is not exclusive to Hortonworks’ Hadoop distribution, called Hortonworks Data Platform and still in beta, however, but is compatible with any Apache compatible Hadoop distro, according to Gaskell.
The concept of using Hadoop to store, process and run deep historical analysis against Big Data and then move subsets of the resulting data into a faster analytic database for near real-time ad hoc queries is not new. Greenplum, Vertica, Netezza and Aster Data all position their databases as complimentary to Hadoop in this way. What distinguishes Kognitio’s database from those vendors’ products is its purely in-memory storage architecture.
By storing data in-memory, analytic and visualization applications can quickly hit the data needed to answer a particular query, rather than wading through tables and cubes of data stored on disk. As the laws of physics are immutable, data can be accessed off spinning disk only so fast, resulting in high latency for large queries.
The drawback, until recently anyway, is expense. For years, storing even small amounts of data in-memory was prohibitively expensive for most organizations. But in the last couple of years as the technology has improved – today the Kognitio platform is capable of storing a terabyte of data in-memory on just four servers — the price of memory has dropped significantly to the point that a purely in-memory database is economically feasible for some enterprises. That’s good news for Kognitio, whose database was developed with an all in-memory architecture from inception over 10 years ago.
The company is also benefitting from the buzz surrounding in-memory databases brought about by SAP’s HANA, Gaskell said.
But in addition to competition from the MPP database crowd, Kognitio may one day face competition from within the Hadoop ecosystem itself. Vendors like Hadapt and others are working to improve the open source Big Data framework’s ad hoc query capabilities, specifically by improving upon Hive, a Hadoop-based data warehouse. There is still significant work to be done before Hive or any other Hadoop-based database can match the performance of an all in-memory analytic engine, however.
It’s early days for this particular partnership, of course, but Gaskell expects significant demand for Kognitio deployed in conjunction with Hadoop, as the impetus for the arrangement was customer feedback. I’m particularly interested to see how many customers opt to pair Kognitio’s database deployed in the cloud with cloud-based Hadoop. Kognitio uses Amazon Web Services to host its public cloud customers’ deployments, and AWS offers its own Hadoop service, Elastic MapReduce. That seems like a natural and potentially compelling fit for Amazon customers.