MLOps Explained

MLOps (Machine Learning Operations) is one of the emerging job roles in recent times. According to the LinkedIn report, in the last four years, the demand for machine learning roles and artificial intelligence roles has spiked by 74% annually.

Why there is a surge in demand for MLOps & why we need it?

Before the advancement of hardware, data technologies the AI field was handled by a small group of experts where they mostly worked with a limited set of data including academic datasets for research. And the data was specifically collected or prepared for specific research. Hence, the flow was smooth and easily manageable. But as the enterprises started to adopt & integrate AI at the core of their products and services there was a need for a streamlined end-to-end process that is sustainable at scale and trackable to bring the machine learning model from research to production.

This streamlined end-to-end process of deploying machine learning model to production in addition to the continuous training, data automation, platform automation, model/data versioning built upon the principles of DevOps (6 C’s) can be called as MLOps (Machine Learning Operations)

But is it not that machine learning is only about designing the algorithms & building models?

No, it’s not. The below figure from Hidden Technical Debt in Machine Learning Systems will help to understand.

mlops technical debt — Source: Hidden Technical Debt in Machine Learning Systems (https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf)

The above figure depicts that apart from the black box labeled as “ML Code” which is the smallest part of the data science, there are other components that play an important role as well, which needs to be taken care of to iteratively deliver the successful Machine Learning System.

Is MLOps different from DevOps? What is MLOps?

MLOps is the behavior where talents like Data Engineers, Data Scientists, Operations team, ML Engineers can collaborate and communicate efficiently to build, manage the ML application lifecycle and build better products with each iteration.

MLOps is based on the principles of DevOps which is extended to include model versioning, data versioning, continuous training, continuous monitoring (For ex data drift) along with data, platform automation. And to be able to succeed with MLOps the foundation of DevOps is necessary.

MLOps is also about productionalizing the algorithms; helping move from research to prod; this is important, as most AI projects are stuck only on the research front

Need for MLOps

For a typical ML project, we start with data which includes data acquisition, cleaning, pre-processing followed by feature extraction. And this is an iterative process, hence there should be re-usable pipelines in place with checkpoints
The next step is building the model which requires experimentation. And with experimentation, there is a need to track parameters, metrics. hyper-parameters which will help us in further tweaking the parameters
While we work on this experimentation, some code is written and that code needs to be version controlled
Once the model is built, it needs to be deployed somewhere (Dev/Stage environment) where model validation can take place
If the validation is successful, the model will be deployed to production where there is a need to monitor the model in terms of performance, data drift, model drift and so on

MLOps break-down

The MLOps is built upon the DevOps which is the foundation. Let’s try to understand, once the foundation (DevOps) is ready what’s next?

The core requirement to solve given ML problem is data and doing data manipulation manually is not feasible & recommended

While dealing/working with Machine Learning problems, one of the core parts is to deal with data like Data acquisition, Data processing, ETL. Data acquisition can happen from single/multiple sources which further require aggregation if required by the ML use-case. Within data acquisition, talents like domain expert, data engineer, data ops engineer, data scientist work together to derive sources for data acquisition and the quality of data, hence based on that jobs around data processing/ETL can be defined to clean the data which can be further followed by the features engineering. Here, we are talking about DataOps and Data Engineering. At a high level, we are looking at data automation where things like ETL jobs, datasets versioning, tracking, cataloging, pre-processing, and similar jobs to taken care of.

All talents should be able to work & collaborate efficiently to operationalize machine learning solutions like experimentation, creating ML workflows, training and so on. And such requirements call out for a platform.

There are many cloud-enabled platforms available like Amazon SageMaker Studio, Azure ML Studio, Google AI platform which can be leveraged as a part of platform automation. With platform automation, it is easy to collaborate, build models, create ML workflows, and more.

After the platform setup, the DevOps principle can be leveraged to implement model versioning, continuous validation, continuous monitoring (For ex: Data drift, model drift, model performance, track metrics and there is no shortfall of issues), feedback mechanism, continuous training.

MLOps is not the standard process that you have to follow, but it is the process that you engage in based on the specific needs. For example, for a given use-case, if that requires you to engage in experimentation, automating validation/staged deployment then that is all you have to follow.

How to start with a typical ML project?

For a given business use-case…

Break-down the big components in to small one
Initially, start manually with data acquisition, cleaning, ETL. Then write small pipelines, containerize it
Next comes the feature engineering, Package the code, write pipeline and containerize it
Next is experimentation, once the algorithm is selected and streamlined, write a code where we can handle the parameters from dynamically, package it and containerize
Once, we have small pipelines in place integrate the pipelines. And to trigger each pipeline sequentially, create workflow around the same
Code versioning model versioning can be done in the process

This is how you can start small with manual effort and automate it step by step. Finally, you will end up with an -end-to-end solution.

MLOps is about systematic management of the ML application.

Who can do MLOps?

There is no clear answer to this. DevOps Engineer, ML Engineer as well as Data Scientist can do MLOps.

Multiple talents should work closely and follow the systematic approach to make MLOps seamless. However, a full-stack Data Scientist can also do MLOps which minimize the inter-dependency among different talents.

mlops roles — Ref: https://github.com/aws-samples/mlops-amazon-sagemaker-devops-with-ml

In the above figure which depicts Machine Learning Application Development Life Cycle, we can observe that multiple talents are tightly coupled to make MLOps possible. As I mentioned earlier, MLOps is the behavior that can be adopted by talents who are working across different components makes MLOps efficient and smooth.

Aspire to be the full-stack data scientist.