It's used in exploring, observing, and understanding the dataset, using:


  • Quality Control
  • Hypothesis Generation
  • Understanding Data
  • Data Visualization


  • Data Preparation
    • Find any Missing Values, wrong data-types values, duplicates.
    • Drop useless columns
    • Rename columns
    • Drop irrelevant data
    • Categorize your values
    • Discover and drop skewed or imbalanced data
    • Create new features
  • Feature Extraction: Feature selection is about selecting attributes that have the greatest impact towards the problem you are solving.
    • Specify features
    • Discover important features (or attributes)
    • Identify relationships and correlations in features
  • Relationship exploration
  • Data summarization
  • Pattern discovery
  • Uni-variate analysis
  • Multi-variate analysis
  • Locate any outliers in your dataset