Feature Extraction

Feature Extraction is the process of extracting(Feature Selection) and Feature Creation to help improving the model performance. An essential purpose of feature engineering in production ML is to reduce computing resources, and this is done by concentrating predictive information in fewer features to promote computing efficiency.

Note

The best features are those with strong, but simple relationships between dependent and independent variables.

Inconsistencies in feature engineering can introduce training-serve skews, leading to poor serving model performance. These inconsistencies arise due to:

  • Training and serving code paths are different (e.g., train in Python but serve in Java), resulting in different transformations between the two
  • Diverse deployment scenarios (e.g., model deployed in different environments like mobile, web, and server)
Danger

Extracting too many features will result in Curse of Dimensionality. Feature Selection methods can be used remove unwanted features and improve dimensionality problem.


Benefits:

  • Improved accuracy of the model
  • Improved generalization of the model

Notes:

  • Feature Extraction in Natural Language Processing (NLP) tasks evolves extracting features from the textual data, often in the form of words, terms, or n-Grams, to represent the content of the documents.