Machine Learning Workflow

Machine Learning Workflow is used in development and design of Machine Learning projects. However Machine Learning Development Lifecycle (MLDLC) is used in delivery of such projects in production and is part of MLOps.

Notes:

  • You don’t need ML until you can prove that you need ML. some problems may seem complex but they may have simple solutions that doesn't require ML.
  • Always set a baseline. then try to beat it. some common baselines:
    • Average human
    • Simple linear model
    • Results of an existing model that works well on similar data, with no tuning.
    • Random prediction for binary classification, and highest frequency class for none-binary.
  • Design your evaluation methodology beforehand, including:
    • evaluation criteria
    • Stopping criteria: maximum processing time or iteration count

Workflow

  1. Defining and formulating a problem: define the problem and Identify best method for generating solution. I.e. classification, regression, or clustering
  2. Data Mining: Select a datasets or collect data. E.g. Web Scrapping, Surveys, etc
  3. Exploratory Data Analysis (EDA)
  4. Data Preparation
  5. Establishing a baseline: A baseline is the simplest model that can solve your problem with minimal requirements and works as a reference point when comparing the actual model with the baseline.
  6. Selecting and training a model by choosing AI Algorithms.
    • Considerations in selecting a model:
      • The scope of the problem: Some specific problems work best with specific models
      • The size of the dataset: Some models don’t work well with too small or large data
      • The level of interpretability: Some models are hard to interpret and explain. E.g. ANN
      • Training time: Training time on different models differs
  7. Model Evaluation & Hyper-Parameter Tuning: Performing error analysis and improving a model. Changes in model(I.e. models selection or hyperparameters) or data(data preparation) can improve the model which should be measured by Evaluation Metrics.
  8. Model Evaluation: Evaluate model on test data.
  9. Deploying a model: using model for practical application. it’s studied under the discipline Machine Learning Engineering for Production (MLOps).
Yes
No
1. Defining the Problem
2. Collecting data
3. Exploratory Data Analysis
4. Data Preparation
5.1. Choose Model
5.2. Train the Model to establish baseline
6. Train Model to improve previous results
7.1. Fine Tune the model
7.2. Evaluate Model
Is Model Good Enough?
8. Deploy Model
9. Monitor & Update Model