Informatica Parses Hadoop
HParser extends capabilities for working with big-data sources

by Ventana Research | 2012-01-13 | Article ID: QT12-03 | Article Type: QuickTake

Related Topics:


Printer friendly version
Email this article
Send feedback to editor

Take

Beginning with Version 9.1, introduced earlier this year, Informatica’s flagship product has been able to access data stored in the Hadoop Distributed File System (HDFS) as either a source or a target for information management processes. However, it could not manipulate or transform the data within the Hadoop environment. Informatica’s HParser is designed to improve this process. Using DT Studio, Informatica’s Eclipse-based integrated development environment (IDE), organizations can create data transformation routines via a graphical user interface that parses the information in log files and other types of data typically processed with Hadoop. Once developed, these routines get deployed to the Hadoop cluster and are invoked as part of the MapReduce scripts, which enables them to use the full distributed processing and parallel execution capabilities of Hadoop. Using a graphical environment to develop these routines should make it easier and faster to create the code necessary to parse the data. Our benchmark research shows that staffing and training are the two biggest obstacles to leveraging Hadoop, so tools like HParser that can minimize the specialized skills required can be valuable to organizations deploying Hadoop.

Informatica is making two versions of HParser available. The community edition is free, but it’s not open source. It can be used to process log files, Omniture Web analytics data, XML documents and the JavaScript Object Notation (JSON) data interchange format. As well as these the enterprise edition supports a number of industry-standard data formats. For the most part, the enterprise offering is targeted for those in the Informatica user base who might be extending their efforts into Hadoop. The community edition may provide enough value for customers not currently working with Informatica to consider trying some of the company’s other products.

Business intelligence vendors and information management vendors alike have embraced Hadoop. We expect to see more investment from Informatica and others as organizations work to make Hadoop a disciplined part of their IT infrastructure processes. As our research shows, integration is one of the top four issues for organizations working with Hadoop. The more that existing products can be extended to incorporate Hadoop or new products can be developed to make Hadoop easier to use, the more widespread its use will become. Die-hard MapReduce programmers may not feel that they need HParser. However, enterprise IT organizations already using Informatica should find it a welcome addition in their efforts to deal with Hadoop-based data sources.

 

 Copyright © 2013 Ventana Research All Rights Reserved :: Privacy Statement :: Contact Us ::