Data lakes began to emerge 10 years ago in response to the desire for analytic data platforms that could economically store and process large volumes of raw data. These platforms access data from multiple operational applications in a variety of formats to be queried by multiple business departments for a variety of analytic workloads.
Organizations face a variety of data and analytics challenges resulting from growth and increased scale. Multiple tools and techniques are needed to derive value from various databases. But, adding more systems means adding more complexities, which can slow operations and add costs for maintaining additional systems. SQL databases have been very popular among organizations for storing and managing data. These databases enable workers to manage and analyze massive volumes of data quickly and reliably.
Improving the quality of information is cited by organizations as the leading benefit of data preparation activities. Data quality efforts are focused on clean data, but increasingly, the importance of bad data is also recognized. To be more accurate, the original data as recorded by an organization’s various devices and systems is important. To fully perform data preparation, organizations must know what data exists – both good and bad.