Evaluation Metrics

Evaluation metrics are used to measure the performance of the machine learning models.

In following equations:

  • is the independent variable(predictor variable).
  • is the dependent variable(actual value, or observer value)
  • is the prediction(estimated dependent variable)
  • is the mean of

Regression Metrics

Error: The difference between the real value and the predicted value is called the error.


Mean Squared Error(MSE): presents he square of the error over all samples.

  • MSE is differentiable. so it’s a good loss function.
  • MSE can be decomposed into variance and bias squared. This helps us understand the effect of variance or bias in data to the overall error.
  • MSE is not robust to outliers.

Root Mean Squared Error(RMSE): RMSE is the most used regression metric and measures the differences between the predicted and actual values.

  • The error calculated has the same unit as the target variables making the interpretation relatively easier.
  • RMSE is not robust to outliers.

Root Mean Squared Log Error(RMSLE): Similar to RMSE while transforming the predicted and real dependent variable into a logarithmic value.


Mean Absolute Error (MAE): sensitive to outliers.

  • MAE is not sensitive to outliers.
  • MAE is not differentiable globally. So it’s not a good loss function.

Mean Absolute Percentage Error


Mean Squared Logarithmic Error

Model Assessment Metrics

Sum of Squared Errors ( or as in Sum Squared Regression Errors ): It’s a measure of how far off our model’s predictions are from the observed values.


Total Sum of Squares ( or ): It’s a measure of the variance in the target variable.


R-Squared(): R-squared evaluates the scatter of the data points around the fitted regression line. It’s also called coefficient of determination and goodness of fit.

  • It’s robust to outliers.
  • It’s considered a poor metric, because it’s result goes up as additional features are added, even if new features don’t contribute to prediction.
  • It's mainly used for Regression Models

Adjusted R-Squared(): It’s similar to but penalizes models when additional features are added.

  • is the degrees of freedom of the estimate of the population variance of the dependent variable
  • is the degrees of freedom of the estimate of the underlying population error variance

Classification Metrics

Unlike regression that handles continuous dependent variable, classification problem handles dependent variables that are classes and is focused on estimating the probability of an observation belonging to each class. Dependent variables in classification problem are discrete and mutually exclusive groups or classes.
Approaches to solving classification problems:

  • Maximum Likelihood Estimation
  • Cross Entropy
    Following metrics are used to determine model effectiveness. Variables used in their formula is extracted according to Confusion matrix.
  • Accuracy: the most commonly used metric in classification. The accuracy shows the ability of the model in making the correct predictions.
  • Accuracy is not a good performance metric when there is imbalance in the dataset.

Precision: the model accuracy on predicting positive examples


Recall(Sensitivity Function or True Positive Rate): the model ability to predict the positive examples correctly.


Specificity(Specification function or true negative rate): specificity is a measure of how well a test can identify true negatives.


F1 Score: the harmonic mean of precision and recall.


AUC-ROC: Area under the ROC Curve is a measure of against false positive rate

  • ROC is a probability curve.
  • AUC is the area under the ROC curve and represents degree or measure of separability.
  • It measures model's ability of distinguishing between classes.

Log Loss(Logarithmic Loss, or Cross-Entropy Loss): Used as a cost function for Logistic Regression and loss function for binary classification problems.

Clustering Metrics

Silhouette Score: mean Silhouette Coefficient for all clusters

  • : mean of the intra-cluster distance
  • : mean of the nearest-cluster distance

Calinski-Harabaz Index: measure the distinctiveness between groups by calculating between-cluster dispersion and within-cluster dispersion.

  • : data points
  • : clusters
  • : within cluster variation
  • : between cluster variation.

Davies-Bouldin Index: average similarity of each cluster with its most similar cluster

  1. Calculate intra-cluster dispersion
    - $i$ : particular identified cluster
    - $T_i$ : number of vectors (observations) in cluster $i$
    - $T_i$ : number of vectors (observations) in cluster $i$
    - $X_j$ : $j$th vector (observation) in cluster $i$
    - $A_i$ : centroid of cluster $i$
    
  2. Calculate separation measure
    - $a_{ki}$: $k$-th component of n-dimensional centroid $A_i$
    - $a_{kj}$: $k$-th component of n-dimensional centroid $A_j$
    - $N$: total number of clusters
    
  3. Calculate similarity between clusters
    - $S_i$ : intra-cluster dispersion of cluster $i$
    - $S_j$ : intra-cluster dispersion of cluster $j$
    - $M_{ij}$ : distance between centroids of clusters $i$ and $j$
    
  4. Find most similar cluster for each cluster
    - Having $i \neq j$,
    
  5. Calculate Davies-Bouldin Index

Probabilistic Measures

Probabilistic Measures are metrics of model performance and complexity. Model complexity itself is the measure of the model’s ability to capture the variance in data.


Akaike Information Criterion (AIC)

  • K = number of independent variables or predictors
  • L = maximum-likelihood of the model
  • N = number of data points in the training set

Minimum Description Length (MDL)

  • d = model
  • D = predictions made by the model
  • L(h) = number of bits required to represent the model
  • L(D | h) = number of bits required to represent the predictions from the model

Similarity Metrics

Similarity metrics are used to compare and evaluate the level of similarity(or closeness) in different data points.


Euclidean Distance is used to calculate straight line distance between two points in an N-dimensional space

Or


Manhattan Distance uses absolute differences of data point’s coordinates to calculate distance in each dimension and then sums them up.

Or


Cosine Similarity uses the angle between two vectors to calculate their similarity.


Jaccard Similarity is measured by the size of intersection and union of two sets.


Pearson Correlation Coefficient is used to calculate linear correlation of variables.

- $x_i$: x variable sample
- $\bar{x}$: mean of values in x variable
- $y_i$: $y$ variable sample
- $\bar{y}$: mean of values in $y$ variable

Related: