Bias and Variance(Statistics)
  • Variance: The variance is a measure of how dispersed or spread out the set is(spread of values from the mean).
    - If the variance is zero, it means all the elements in the dataset are same.
    - If the variance is low, it means the data are slightly dissimilar.
    - If the variance is very high, it means the data in the dataset are largely dissimilar.
    - Deviation: It is a measure of Distance.
    - Standard Deviation: It is a measure of amount of variance or dispersion of a set of values. A low standard deviation is that the values tend to be nearby the mean, whereas high standard deviation defines that the value has way spread over a wider range.
    - Analysis of variance(ANOVA): It is used to compare variance among groups or samples of data distributions(samples and population) and works on a ratio called the F-Ratio.
    Anova is a Univariate feature selection method which selects the best features on the basis of univariate statistical tests. We compare each feature to the target variable in order to determine the significant statistical relationship between them.
    Components:
    - Variation within each group
    - Variation between groups
    - Bias & Variance in estimated values:
    - Bias: Bias refers to estimations where predicted values are far from actual values but not scattered. read more at Bias(DS) and Bias and Variance (ML).
    variance-bias.png
    | | Low Variance | High Variance |
    | --------- | ----------------------- | ---------------------- |
    | Low Bias | Accurate | Scattered: Overfitting |
    | High Bias | Clustered: Underfitting | Inaccurate |