Data Science Lifecycle

The Data Science Lifecycle is an iterative approach to managing data(cleaning, processing, contribution,…) and data analysis.

  1. Business Understanding & Prioritization
  2. data acquisition, mining & ingestion
    • Data discovery & collection
      • Datasets and Databases, API. I.e. CSV, Data-frames, Parquet files, etc.
      • Web Scrapping
      • Survey data
    • Data integration
    • Data fusion
    • Transformation & enrichment
  3. data exploration: Exploratory Data Analysis (EDA)
  4. data cleaning
    • Data scrubbing
    • Handling Missing Values
    • Unbiased estimators
    • De-noising
  5. Data Analysis
    • Analyze data
    • Feature selection
    • Feature engineering
    • Model selection
  6. Data modeling
    1. Model creation & parameter tuning
    2. Model Evaluation & Bias Check
    3. Model Deployment
  7. Report and Delivery: Communicate actionable insights to key stakeholders.
    • Presentation of findings
    • Data visualization
    • Credibility counts: check if your research is valid

~ Business Level Action:

  • Act by using insights
  • Measure impact of action
  • Goal orientation and realignment