Cloudera is riding the wave of big data. I first learned about the company while working at Vertica, one of Cloudera’s partners. Customers that managed large amounts of structured relational data also needed to process large amounts of semistructured data such as the type found in web logs and application logs. The emerging channel of social media provided another source of data lacking the structure that would lend itself to analysis in a relational database. Other organizations needed to perform calculations and analyses that were difficult to express in SQL. Seeing this market Cloudera recognized earlier than others an opportunity to leverage the Apache Hadoop project; it has been offering the Cloudera Distribution for Hadoop (CDH) since early 2009.
I first wrote about Cloudera last year after attending Hadoop World and seeing firsthand significant interest in Hadoop. Much has happened at Cloudera since then and also in the broader big-data market. Cloudera recently made CDH version 3 generally available. (My colleague Mark Smith wrote about CDH3 when it was first announced.) Cloudera says it intends to release additional distributions annually, so we should expect another release early to middle 2012, although the recent entry of competitors into the Hadoop distribution market might prompt Cloudera to accelerate its releases.
In addition to the open source CDH releases, Cloudera offers an enterprise product that combines CDH with support and a set of management applications for authorization, provisioning, monitoring and resource management. The company has been working on version 3.5 of Cloudera Enterprise and proposes a release cycle for the enterprise product about twice as often as the annual releases of CDH. Version 3.5 includes real-time activity monitoring, an expanded file browser to show how files are used and their ownership, and extended authorization management and administration.
Perhaps as significant as the software developments, Cloudera has solidified its place in the market with key customer wins, additional funding, an expanded executive team and new partnerships. Last October, Cloudera announced $25 million in funding. Its partnership with Informatica announced last fall has borne fruit as part of Informatica 9.1, which I covered in a previous post. I’ve also covered Jaspersoft Version 4 whose features include support for Hadoop. In my opinion, these partners are pursuing Cloudera rather than the other way around.
Of course, success often provokes competition. Cloudera’s first-mover advantage in the Hadoop market has attracted attention in the form of alternatives to Hadoop both direct, such as EMC offering its own distribution of Hadoop, and indirect, such as LexisNexis offering an open source version of its high-performance cluster computing system.
We recently completed research on the market requirements around big data, the benefits of adopting one of these alternatives and the obstacles as well. This research, the first of its kind, is the largest, most comprehensive study of issues related to the big-data market. We’ll be sharing some of our preliminary findings in a webinar next week hosted by two of the research sponsors. Time will tell which of these alternatives will succeed. As I’ve expressed in previous posts, I like competition, and you should, too, because it spurs vendors to offer better products at lower prices.