Google announced today that BigQuery, its Big Data Platform-as-a-Service, is now publicly available, a year and a half after debuting via a closed preview. The cloud-based platform allows users to access, analyze and build analytic applications on top of large data sets stored in Google’s data centers.
Google said users can upload, process and analyze up to 100 GB of data per month for free. That hardly seems to qualify as Big Data, but I digress. For above 100GB and up to 2 TB, Google will charge $.12 per GB/month for storage; and $.035 per GB processed as part of a query with a limit of 1000 queries per day or 20TB processed per day, which ever comes first. Over and above 2 TB, call Google for pricing.
The underlying technology used to store, process and otherwise crunch all that Big Data is the offspring of the technology Google originally developed to allow it to index the Web, according to a Google spokesperson. That, of course, includes MapReduce, the parallel processing programming model Google developed and made famous in its 2004 paper, and which today is the core processing framework in Hadoop.
BigQuery is accessible via a “simple UI or REST interface,” according to a blog post by BigQuery Product Manager Ju-Kay Kwek. “It lets you take advantage of Google’s massive compute power, store as much data as needed and pay only for what you use. Your data is protected with multiple layers of security, replicated across multiple data centers and can be easily exported,” he writes.
Not unlike Amazon Web Services’ Elastic MapReduce services, the real value of BigQuery is the ability to quickly scale-up Big Data projects as a service without the upfront CapEx required for an on-premise deployment. On the flip side, users can scale down equally quickly and pay only for the storage and compute resources they use.
Another key is Big Data PaaS offerings like EMR and BigQuery remove the need for companies to hire and/or train Big Data staff, a difficult prospect in today’s market due to a lack of skilled Big Data practitioners. Rather, Amazon and Google are responsible for deploying, managing and scaling installations.
A big difference between Google’s Big Data PaaS and Hadoop, however, is that BigQuery is a proprietary platform, where as Hadoop is open source. Whether that impacts adoption by enterprises that don’t want to get locked into a relationship with Google remains to be seen.
However, if I were an enterprise or developer evaluating BigQuery, I’d be sure to ask Google if the Big Data applications I build on top of Big Query can be easily migrated to Hadoop should I choose to go that route in the future. The last thing you want to do is spend months building complex, distributed analytic applications on top of BigQuery only to discover they can’t be reused on any other Big Data platforms without extensive rewriting.
If Google answers that question adequately, I can see BigQuery and similar services as a good potential starting point for enterprises looking to tap into the power of Big Data Analytics but are not prepared to commit to a full-scale, production level deployment just yet. In other words, the cloud (with its related services) is a great place to get your feet wet with Big Data. Beyond that, Amazon, Google and others need to prove the value proposition remains as attractive for production-level Big Data deployments.