Exploratory Data Analysis (EDA)

It involves visually and statistically examining a dataset to understand its main characteristics, uncover patterns, spot anomalies, and generate hypotheses
It's the process of visual and statistical examination of data (exploring & observing), and understanding the dataset (it's characteristics, patterns, and anomalies), to generate a generate hypotheses or plan the data science experiment:



  • Data Preparation
    • Find any Missing Values, wrong data-types values, duplicates.
    • Drop useless columns
    • Rename columns
    • Drop irrelevant data
    • Categorize your values
    • Discover and drop skewed or imbalanced data
    • Create new features
  • Feature Extraction: Feature selection is about selecting attributes that have the greatest impact towards the problem you are solving.
    • Specify features
    • Discover important features (or attributes)
    • Identify relationships and correlations in features
  • Relationship exploration
  • Data summarization
  • Pattern discovery
  • Uni-variate analysis: Descriptive Statistics on one single variable(E.g. Categorical Variables and numeric variables with measures such as Central Tendency, Dispersion, Shape).
  • B-variate analysis: Descriptive Statistics between two variables.
  • Multi-variate analysis: Descriptive Statistics between more than two variables. ℹ️ Scatter Plot, Correlation Plot, and Heat Map are often used in Bi-variate and Multi-variate analysis.
  • Locate any outliers in your dataset