For months the speculation was rampant,and now the rumors have proven to be true. Yahoo has officially announcedthat it will become a player in the emerging Hadoop market. Hadoopprovides distributed computing capabilities that enable organizations to process very large amounts of data quickly. Backed by Yahoo and Benchmark Capital, a new entity called Hortonworkshas formed around a team from Yahoo that consists of more than 20 key architects of and contributors to the Apache Hadoop project. The company will start with some 25 employees and “will be hiring aggressively from our collective networks,” according to Rob Bearden, Hortonworks president and COO.
The strategy behind Hortonworks is relatively simple: focus on adoption and maturation of the open source Apache Hadoop project. The name plays off that of the toy elephant that symbolizes Hadoop and is a reference to the elephant in the Dr. Seuss book Horton Hears a Who. Hortonworks CEO Eric Baldeschwieler, formerly VP of software engineering for the Hadoop team at Yahoo, spoke recently at the IBM Big Data Symposium where IBM also indicated its support for the Apache distribution of Hadoop. Other vendors including Cloudera, EMC Greenplum and MapR have announced their own distributions of Apache Hadoop, rather than relying solely on the Apache distribution.
Our forthcoming research on Hadoop and information management shows that enterprises are interested in Hadoop. In a recent webinar we shared preliminary findings that more than 50% of the participating organizations are using Hadoop, planning to use it or evaluating it. The research also shows that nearly half the organizations using Hadoop are using more than one distribution, which suggests that existing distributions are immature and incomplete. So the fundamental premise of Hortonworks addresses a real market need. The main question will be whether Hortonworks can harden the Apache distribution quickly enough to attract market share sufficient to survive and thrive. Since it is committed to the open source model in which software is available for free, revenue will come only from training and support services. Initially, Yahoo will be Hortonworks’ primary customer, providing Tier 3 support, but the new entity will be competing with others such as Cloudera that offer training and support services for Hadoop.
A secondary question is whether Hortonworks’ entry into the market will disrupt other players. On one hand, nothing has changed – Apache Hadoop is the basis for the offerings from Cloudera, EMC and MapR. Where these vendors have found the Apache distribution lacking, they have made improvements and then either contributed the changes back to the Apache project or offered the improvements as proprietary extensions of or replacements to the Apache distribution. On the surface this model can continue uninterrupted. As the Apache distribution gains more features, others can continue to add value elsewhere. However, if the Apache distribution were to gain enough features rapidly enough, it might possibly take the market away from other vendors before they have built a sufficient customer base to fund their ongoing activities. I suspect the Hadoop market will continue to grow rapidly enough that several vendors can survive and that one vendor’s success will not, in the near term, cause the demise of another.
I’m somewhat surprised by Hortonworks’ choice of business model – choosing to go with a purely open source licensing scheme. I haven’t done empirical research, but I sense that most commercially successful open source companies also offer a premium (often called “enterprise” or “professional”) version of their product for which they charge a licensing fee. This combination of open source and premium product is referred to as an “open core” licensing model. I imagine Yahoo sought to maximize the value of its investment in Hortonworks, and obviously the owners could later change the business model, but given the prevalence of open core and the potential for higher margins associated with software license revenues, I expected that kind of business model.
Hortonworks has some things in its favor. Our research shows that the Apache distribution is most prevalent: 63% of organizations that use Hadoop have it as one of the distributions they use. However, Cloudera was a close second with 55%, which gives some credence to the open core approach. Hortonworks begins with some top engineering talent, but those experts will need support from experienced software executives to help manage the business. The company also potentially inherits Yahoo’s presence in the Hadoop market. Yahoo has been hosting the Hadoop Summit for years, contributing to the Apache distribution and sharing its knowledge of Hadoop from its internal usage by more than 1,000 users. We expect a close working relationship however it is structured.
As we discussed in the webinar last week, Hadoop technology is not yet mature. That organizations are drawing from multiple distributions suggests that no one has a lock on the market yet. Regardless of how big this market turns out to be, these are its early days and there is plenty of time for Hortonworks to grab its share.