Machine Learning Pipeline

Machine Learning Pipeline is a technical infrastructure used to administer and automate machine learning processes and workflow. it includes Data Pipeline to handle Data Preparation tasks and creating models.

Machine Learning Pipeline Architecture is the collection of components, stages, and workflows in a Machine Learning Pipeline.

Machine Learning Pipeline Design Deals with tools, paradigms, techniques, and programming languages used to implement the pipeline and its components.


  1. Data Pipeline for Data Ingestion
    1. Data Collection: Data Mining
    2. Data Pre-Processing: Data Preparation
    3. Data Storage
  2. Enrich & Transform for Machine Learning task
  3. Feature Extraction & Feature Selection
  4. Version Control
  5. Split data for training and evaluation
  6. Model Training
  7. Evaluation and validation
  8. Deployment of model or presentation and visualization of results
  9. Continuous Monitoring and Maintenance


  • Efficiency
  • Scalability
  • Templating and reproducibility
  • Standardization

Machine Learning Pipeline Architectures

  • Single leader architecture
  • Directed acyclic graphs
  • Foreach pattern
  • Embeddings
  • Data parallelism
  • Federated learning
  • Synchronous training
  • Parameter server architecture
  • Ring-AllReduce architecture