White Paper

To keep reading or download the pdf

Fill out the Form

 

Read Time:

13 min.

 

Prepared for:

Hortonworks_Logo

 

Font Size:

 

Font Weight:

Apache Hadoop 3 Improves Big Data Workloads:

Platform Evolves to Meet Modern Requirements

Apache Hadoop: Background and History

In the spring of 2006, the Apache Software Foundation released Hadoop, a distributed computing framework for managing and analyzing very large amounts of data in a scalable and reliable way. The open-source software was designed to run on clusters of servers ranging from a few nodes to thousands of nodes, allowing users to pool computing power to enable the processing of workloads far more cost-effectively than had been possible earlier.

It’s been more than 10 years since the project began, and in that time an active open-source community has helped mature Apache Hadoop significantly. The community has contributed many enhancements including high availability, governance and analytical processing improvements.

The core of Hadoop consists of four modules: Hadoop Common, comprised of the common utilities that support the other Hadoop modules; the Hadoop Distributed File System (HDFS), which provides high-throughput access to application data; YARN, a framework for job scheduling and cluster resource management; and MapReduce, a YARN-based system for parallel processing of large data sets. Since the project’s inception, other Hadoop-related Apache projects such as Apache HBase, Apache Hive, Apache Spark and many more have been developed, but this paper focuses on the core of Hadoop and key enhancements to its modules.

 

 
 

Fill out the form to continue reading