MLOps (Machine Learning Operations) is a core function for Machine Learning engineering which practices creating a consistent and reliable means to automate the training and deployment of models coupled with robust and comprehensive monitoring and maintenance. MLOps assures standardize and unify ML system development life-cycles.

MLOps attempts to handle challenges in Machine Learning Operations, such as:

  • lack of reproducibility and provenance
  • inefficient collaboration
  • manual tracking
  • slow-to-market final product
  • models blocked before deployment

Automation in MLOps, similar to DevOps, must guarantee “Continuous Integration/Continuous Delivery“, with the addition of “Continuous training”. It may also include data versioning.


  • Machine Learning Development Lifecycle (MLDLC) or MLOps Life-Cycle
  • Data problems:
    • Data Collection: Data Ingestion and Data Validation
    • Labeling Consistency: clean and consistent data is essential in optimizing predictions.
    • Data Validation:
      • Concept Drift: Changes in the relationship (aka mapping) between the input and output variables over time.
      • Schema Skew: Training and serving data do not conform to the same schema (e.g., due to different data types present)
      • Distribution Skew: Distribution of serving and training data are significantly different (e.g., due to seasonality changes over time). This skew comprises granular issues of dataset shift and covariate shift.
      • Feature Skew: Training feature values are different from the serving feature values (e.g., due to transformation applied only on the training set)
  • Model Serving refers to making the trained models available for end-users to utilize.
  • Obtaining model predictions in production:
    • Batch inference: deployed ML model makes predictions based on historical input data.
    • Real-time inference: predictions are generated in real-time using the input data available at the time of inference.
  • Orchestrators: They perform Orchestration Task, meaning they coordinate how machine learning tasks run and where they get the resources to run their jobs. Orchestrators are concerned with lower-level abstractions like machines, instances, clusters, service-level grouping, replication, and so on.
  • Optimizing model for production:
    • ML model optimization:
      • Inference Time: Time require to perform prediction.
      • Model quality: mostly accuracy. more at Evaluation Metrics.
    • Latency: the delay between a user’s action and the application’s response to the action.
    • Throughput(Concurrency): the number of successful requests served in a unit of time.
    • Operational constraints(Cost): refers to the infrastructure costs associated with inference.
      • GPU cost
      • Model size
      • Server load(CPU, caching, bandwidth)
  • Model Deployment:
    • Data center: hosted on a server(available via API) or used in back-end server
    • On-device: Installed on devices such as mobile phones or user’s computer. Operational Constraints(GPU, model size, …) can limit this option.
  • Containerization: ML applications are typically associated with many dependencies and configurable items. Containers make it easy to package and run the entire ML application in a lightweight manner without worrying about operating system or development environment requirements.
    • Container platforms: used for building, deploying, and managing containerized applications.
    • Container orchestration platforms: manage and sync multiple containers across multiple machines.
  • Tracking model experiments: it means that you should easily be able to evaluate your model, and perform experiments by creating and testing other frameworks or models from your data.
    • Aspects: code, hyperparameters, execution environment, library versions, and model performance metrics
    • A good habit is to include consistent and meaningful tags for each experiment so that results are organized and easy to interpret.
  • Model Registry(Model Versioning):
    • A model registry is a repository used to store and version trained machine learning (ML) models. Model registries greatly simplify the task of tracking models as they move through the ML lifecycle, from training to production deployments and ultimately retirement.
    • As each model may have different code, data, and configuration, it is important to perform model versioning.
    • With model versioning, we can readily retrieve older models and understand model lineage (i.e., the set of relationships among artifacts that resulted in the model).
    • Model registries are essential in supporting model discovery, model understanding, and model reuse, including in large-scale environments with hundreds of models.
  • Scalability: both scaling model for public use, and improving it with new data.
    • Vertical scaling: using more powerful hardware on a single machine. due to technological and economical restrictions, a single machine may not be sufficient for the given workload.
    • Horizontal scaling: Using Distributed Systems to achieve scale by adding more nodes/devices (servers) to handle data storage and load.
      • elasticity: ease of adjusting the number of nodes based on load, throughput, and latency requirements
      • model replication: load balancing and support for parallel request handling
      • CON: increased complexity as a tradeoff of using Distributed Systems.
  • MLOps Maturity: The maturity of an MLOps system is determined by the level of automation of the data, modeling, deployment, and maintenance.
    • MLOps Level 0 is the basic level where ML lifecycle processes are manual.
    • MLOps Level 1 introduces pipeline automation with the goal of automated continuous model training. it automates:
      • data and model validation
      • pipeline triggers
      • metadata management
    • MLOps Level 2 is still not commonly used. it involves robust automated continuous integration/continuous delivery (CI/CD) so that teams can rapidly explore new ideas around feature engineering, model architecture, and hyperparameters.
  • Continuous Integration/Continuous Delivery (CI/CD)
    • Continuous Integration: the building, packaging, and testing of new code when it is committed to the source code repository.
    • Continuous Delivery (CD): the process of deploying the system of new code and newly trained models into the target environment while ensuring compatibility and prediction service performance.
  • Progressive Delivery: it is deemed as a development process that is an improvement over CI/CD. It focuses on gradually rolling out new features to limit potential deployment risks and increase the speed of deployment. It involves delivering changes first to small, low-risk audiences and then expanding to larger and riskier audiences.
  • Monitoring: ML development is a cyclical iterative process, and monitoring is vital for improving and sustaining the ML system. ML Monitoring is similar to software monitoring with additional components of data and the model.
    • Continuous monitoring is used to identify:
      • Data skews
      • Model staleness
      • Negative feedback loops
    • Components:
      • Functional monitoring keeps an eye on model predictive performance and changes in serving data. These include model performance metrics and the distributions and characteristics of each feature in the data.
      • System monitoring refers to monitoring the production system’s performance and the serving system’s reliability. It includes operational metrics like throughput, latency, resource utilization, etc.

Considerations in MLOps:

  • cost management
  • sustainability
  • robustness
  • Reproducibility
  • Reliability and maintenance Assurance
  • business logic
  • Secure access to model and tools
  • Latency requirements for the model
  • Compatibility with different platforms and tools
  • Ease of development
    • transition from training environments
    • Automation of training jobs
    • Feature selection, feature engineering, and feature stores.
    • Easy access and explainability of derived features
  • Observability: Log during training and serving
    • Visibility of model performance
    • Model concept changes over time
    • Data validation & freshness checks
    • In production observation and monitoring of model
  • Data Sourcing: Why & How data model is going to be leveraged?
    1. feasibility to make the process repeatable at scale from the data sources
    2. Quality sourced data
    3. Ethics of data source(privacy) and legal implications
  • Data Preparation: ensuring the data is properly annotated, rated, judged, and labeled to create optimal input for the model.
    • ontology or data model that describes the contents of your data and how they’re related to each other
    • label unstructured data such as text and images and extract its content which then turns into a knowledge graph
  • Model Testing, Training, and Deployment: Create the model on available infrastructure and train it. then test to ensure accuracy of model. “human-in-the-loop” approach is often used in this process.
    • Test the model with your labeled data and then testing it with a different set of unlabeled data to see if the predictions are accurate
    • identify any issues or gaps in the data so it can be trained and retrained as needed
  • Model Evaluation

Learning Material: