Probability Distribution

Distribution of a statistical data set (or a population) is a statistical function or listing that describes all possible values (or intervals) and their occurrence in a random experiment with it’s associated probability.

Probability Distribution Is a mathematical function that maps the all possible outcomes of an random experiment with it’s associated probability. It depends on the Random Variable X , whether it’s discrete or continues.

Types of Distributions:

  • Continuous probability distributions: A continuous distribution describes the probabilities of the possible values of a Continuous Random Variable. A Continuous Random Variable is a random variable with a set of possible values (known as the range) that is infinite and uncountable.
    • Normal Distribution(Gaussian Distribution): It is a probability distribution with a bell-shaped curve where The peak always divides the distribution in half. It's often characterized by:
      • Mean ()
      • Standard Normal Distribution(σ): This type of distribution are normal distributions which following conditions.
        • Mean of the distribution is 0
        • The Standard Deviation of the distribution is equal to
    • Exponential Distribution: Models the time until an event occurs in continuous time, with indicating the rate of event.
    • Gamma Distribution
    • Chi-Square Distribution
    • Weibull Distribution
    • Laplace Distribution
    • Beta Distribution
    • T-Distributions
  • Discrete probability distributions
    • Bernoulli Distribution: It describes a single experiment that has ONLY two outcomes.
    • Binomial Distribution: is a method of calculating probabilities for experiments having a fixed number of trials(successes in repeated Bernoulli experiments).
      • The binomial distribution is used to estimate the total number of successes from n trials when only two possible outcomes are there: success and failure.
      • The name Binomial suggests two mutually exclusive outcomes of trials.
    • Multinomial Distribution
    • Negative Binomial Distribution: The negative binomial distribution is similar to the Poisson distribution but with two parameters instead of one: r and p. In such a case, the Poisson distribution is the limiting case of the negative binomial distribution.
    • Geometric Distribution: It’s similar to the Binomial distribution, however, the experiment continues until the S successes are achieved
    • Poisson Distribution: Poisson distribution is used to model the number of occurrences of a certain event given a very large number of observations and the probability of the desired event to occur in each observation is significantly smaller. indicates the rate of events. I.e. Models the number of events occurring within a fixed interval of time or space.
    • Hypergeometric Distribution: The hypergeometric distribution is related to the number of successes in a sequence of N trials from a finite population without replacement.
    • Beta-binomial distribution
  • Uniform Distribution: It means that all outcomes are equally likely, and can be both Discrete and Continuous.
  • Joint Probability Distribution
  • Conditional Probability Distribution
  • Data distribution types based on:
    • Number of peaks:
      • Unimodal distribution
      • Bimodal distribution
      • Multimodal distribution
    • Symmetry(Uniform): A symmetric distribution with little skewness which has two sides that are mirror images of each other and can have a peak(for normal distribution) or a bottom for U-shaped graphs.
    • Skewness: A measure of the deviation of a random variable’s given distribution(assymetry in the data or variable distribution) from the normal distribution.
      • Negative skew: Distribution Concentrated in the right, left tail is longer.
      • Positive skew: Distribution Concentrated in the left, right tail is longer.

Notes:

  • Joint distributions: Getting a distribution over some combination of several random variables.
  • Marginal distributions: If we have a joint distribution over some set of random variables, it is possible to obtain a distribution for a subset of them by “summing out” (or “integrating out” in the continuous case) the variables we don’t care about. I.e.
  • To identify distribution one of following techniques can be performed:
    • Plot a histogram out of the sampled data.
    • Check Skewness and Kurtosis of the sampled data.
    • Use Kolmogorov-Smirnov or/and Shapiro-Wilk tests. Used to identify normality in distribution.
    • Check for Quantile-Quantile plot.