Decoupling the AI/ML Pipeline for Scale

Download the PDF:

 Click Here


Read Time:

8 min.


Sponsored by:

Teradata logo


Font Size:


Font Weight:

Ensure Models Reach Production to Deliver Value

Challenges with Realizing the Promise of AI/ML

Artificial intelligence and machine learning (AI/ML) are extremely valuable to organizations. Our research shows that AI/ML help organizations gain aVentana_Research_DI_Machine_Learning_02_Challenges_in_Applying_Machine_Learning-png competitive advantage and improve customer experiences, and have a direct impact on the bottom line, increasing sales and lowering costs. AI/ML are used across industries and throughout many departments in organizations. Consequently, the use of AI/ML has continued to expand.

But deploying AI/ML effectively can be hard for organizations. The top challenge in applying ML reported by participants in our research is accessing and preparing data. Many also report a lack of skilled resources, as well as facing challenges deploying models to production. Without deploying models, organizations cannot realize the value of their investment in model development.

The Three Steps to Get to Production

All the glory in data science is associated with model development, but data is rarely in the exact form needed by different algorithms. The process of preparing the data in the right form and finding the most relevant fields is referred to as “feature engineering.” This is a key step because the input features that are used for model development are the biggest differentiator between a good model and bad model. They also constitute an important element of the organization’s intellectual property. Feature engineering results in data pipelines that encapsulate the data manipulations needed in preparation for training a model. These pipelines ensure that each time the model is run, the data is prepared in the correct format.

Model development requires both good data and knowledgeable resources. If you do not have good data, your models will not get good results. Data is the main differentiator, even more than the algorithm, technique or parameters used. While tools have helped reduce the knowledge required, there is still much that must be understood in the modeling process. Data scientists must understand the various modeling techniques and their applicability to different situations as well as the various parameters of different algorithms in order to tune the models they produce.

Model deployment is a critical step often overlooked in the planning process, but one that is necessary before organizations can realize any benefits from AI/ML efforts. Development and deployment are typically handled by separate resources, with deployment often requiring coordination between data scientists, data engineers and IT teams.

From Feature Engineering to Feature Store

Feature engineering is a process of testing and iteration involving data pipelines and exploration. Data scientists need to brainstorm and test different features, derived from fields or combinations of fields, to find those that have the greatest predictive value. For example, normalizing a continuous set of values between zero and one, or binning continuous values into several buckets. Once features are identified, further exploration can lead to improvement of their predictive value. In the previous example, for instance, refining the number of buckets or ranges of the values included in the binning process.

Unfortunately, data pipelines for feature engineering are siloed and rarely reusable, often incorporated into the model-building process rather than conducted separately. As data scientists design and test features, all the work related to a model is generally coded in notebooks. This is helpful for repeatability, however, while notebooks can be shared to allow multiple people to work together, they are not designed for reusability of individual elements. Instead, code is copied from one notebook to another which leads to versioning and maintenance issues.

By 2025, 9 in 10 analytics processes will be enhanced by AI/ML to streamline operations and increase the value that can be derived from data. With the growing use of AI/ML, organizations need a way to share features among all the various use cases and scenarios. A feature store is a repository that enables organizations to share and reuse features, making data science teams more productive by cataloging and providing access to the collection of features already developed. This repository becomes the basis of the product process for predictive models.

Feature stores also help with maintenance and governance because only a single version of each features exists. Organizations need to be accountable for their decisions, whether made by human or machine. Given any specific decision, it must be possible to find out what model was in production when the decision was made, what parameters it was using, and what data it was trained and scored with. A feature store provides this accountability.

Support Flexibility in Modeling Tools

Data scientists have long preferred the tools with which they like to work. Since no tool has emerged to dominate AI/ML modeling, the market is fragmented with a variety of tools in use. Many of these tools support common languages such as R and Python. Each tool has attracted its own proponents.

Organizations need data scientists to be as productive as possible. AI/ML require different skill sets than other analytics tools, and many organizations indicate they lack the expertise needed. Nearly one-half (45%) of participants in our research indicate they have little or no expertise in AI/ML. Because of the scarcity of resources, organizations should try to accommodate data scientists by allowing them work with the tools they prefer, assuming those tools comply with data governance and security standards.

Fortunately, regardless of what tool organizations use, there are many options for deploying models. Standards for exporting models include predictive modeling markup language (PMML), portable format for analysis (PFA) and open neural network exchange (ONNX). Many databases have been extended to be able to execute models. Certain models can be converted to SQL for execution within databases, while other tools support exporting models for execution in JAVA.

Putting It All Together

Organizations must deploy models into operational systems to realize the value those models provide. Too often, models never move from experimentation and development into production. It is critical that the teams developing models coordinate with the teams that deploy models into production applications. These applications span the organization from customer facing functions such as customer service, sales and marketing, to back-office functions of finance, operations, and research and development. Deployment involves linking these models to the operations applications to score new data as it is generated.

Developing and deploying models also requires repeatability and governance. Nearly three-quarters of organizations plan to increase their usage of AI/ML. The number of models will continue to increase, with some organizations deploying hundreds if not thousands of models. Organizations must be prepared to manage these deployments with efficient, repeatable processes. Utilizing a feature store will aid with efficiency by minimizing the number of separate pipelines that must be managed. Repeatability will also provide better governance of the models.

AI/ML requires continuous monitoring, refinement and redeployment. Once a model is deployed, it must be monitored to determine if its accuracy has drifted. New data and new market conditions dictate that models will need to be revised frequently. Organizations should use champion-challenger processes to identify new models for deployment. This process of monitoring, retraining and deployment models is referred to as MLOps.

Next Steps

AI/ML is extremely valuable and can have a significant impact on an organization’s bottom line. To best utilize what AI/ML have to offer, organizations should:

  • Provide data scientists with flexibility in the tools available for them to use.
  • Deploy models to production systems with repeatable, well-governed processes.
  • Utilize a feature store to share features among multiple models.
  • Monitor and revise models to ensure continued accuracy.
  • Create accountability and an audit trail of how models were created and applied.

These steps will maximize the value of AI/ML investments across the organization.

About Ventana Research

Ventana Research is the most authoritative and respected benchmark business technology research and advisory services firm. We provide insight and expert guidance on mainstream and disruptive technologies through a unique set of research-based offerings including benchmark research and technology evaluation assessments, education workshops and our research and advisory services, Ventana On-Demand. Our unparalleled understanding of the role of technology in optimizing business processes and performance and our best practices guidance are rooted in our rigorous research- based benchmarking of people, processes, information and technology across business and IT functions in every industry. This benchmark research plus our market coverage and in-depth knowledge of hundreds of technology providers means we can deliver education and expertise to our clients to increase the value they derive from technology investments while reducing time, cost and risk.

Ventana Research provides the most comprehensive analyst and research coverage in the industry; business and IT professionals worldwide are members of our community and benefit from Ventana Research’s insights, as do highly regarded media and association partners around the globe. Our views and analyses are distributed daily through blogs and social media channels including TwitterFacebook, and LinkedIn.

To learn how Ventana Research advances the maturity of organizations’ use of information and technology through benchmark research, education and advisory services, visit