Data Gathering, Events, and Experiments

  • Population: it is the record of group of individuals (objects) we observe to collect information from.
  • Sample: it is information about a part or subset of the population.
  • Sample Space (): it is the space containing all the uniquely exhaustive outcomes() of an event over number of trials.
  • Mutually Exclusive Events: Two events A and B are said to be mutually exclusive if they cannot occur at the same time or the occurrence of A excludes the occurrence of B.
  • Independent Events: Two events A and B are said to be independent if the occurrence of A is in no way influenced by the occurrence of B.
  • Census: it is information about every individual in a population.
  • Random Samples: it s a sample where every member of the population has an equal chance to appear in the sample.
  • Random Variables:
    • Random variables: Random variable are the numeric outcome of an experiment or random events. They are normally a set of values.
    • Independent Random variables: We say that two variables X and Y are independent if their joint distribution factors into their respective distributions. I.e.
    • Discrete Random Variables: Such variables take only a finite number of distinct values.
    • Continuous Random Variables: Such variables can take an infinite number of possible values.
  • Experimental Problems
    • Sampling Bias(Sampling Error): when samples are collected in such a way that some individuals are less (or more) likely to be included in the sample.
      • Self-Selection
      • Non-Response
      • Under-coverage
      • Survivorship
    • Confounding Variable(Confounder): A variables that influences both the dependent variable and independent variable and renders the conclusions of a study useless.
  • Sampling Techniques:
    • Probability Sampling
      • Simple Random Sampling
      • Stratified Sampling
      • Clustered Sampling
      • Systematic Sampling
      • Multi stage Sampling
    • Non-probability Sampling
      • Convenience Sampling
      • Voluntary Sampling
      • Referral or Snowball Sampling
      • Quota Sampling
      • Judgmental or purposive sampling
  • Hypothesis Testing: This test is carried to figure out whether an effect is statistically significant or not. Using this test, it can be figured out whether the effect is insignificant, but we can’t determine it to be significant on the basis of hypothesis testing.
  • Monte Carlo Simulation: A computational technique that uses random sampling to model the behavior of complex systems or processes in problems with a large number of variables and uncertainties. Steps:
    1. Define the Problem
    2. Specify Probability Distributions
    3. Generate Random Samples
    4. Run the Model
    5. Repeat
    6. Analyze Results

Probability Distributions

  • Continuous Distributions

    • Normal Distribution (Gaussian Distribution): Symmetrical bell-shaped curve for continuous variables.
    • Uniform Distribution: All values within a specific range are equally probable.
    • Exponential Distribution: Represents the time between independent events in a Poisson process.
    • Lognormal Distribution: Right-skewed distribution for positive continuous variables with skewed positive values.
    • Gamma Distribution: Used for modeling waiting times or counts.
    • Beta Distribution: Models proportions or percentages between 0 and 1.
    • Cauchy Distribution: "Heavy-tailed" distribution with no defined mean or standard deviation.
  • Discrete Distributions:

    • Bernoulli Distribution: Only two possible outcomes (success/failure).
    • Binomial Distribution: Fixed number of trials with only two possible outcomes (success/failure).
    • Poisson Distribution: Models the number of events occurring in a fixed interval of time or space.
    • Geometric Distribution: Number of trials needed for the first success in a series of independent trials with only two possible outcomes.
    • Hypergeometric Distribution: Sampling without replacement from a finite population containing distinct categories.
    • Multinomial Distribution: Generalization of the binomial distribution for multiple possible outcomes.
    • Negative Binomial Distribution: Number of trials needed to achieve a specific number of successes in a series of independent trials with only two possible outcomes.
  • Other Probability Functions:

    • Joint Probability Distribution: Describes the probability of multiple events occurring simultaneously.
    • Conditional Probability Distribution: Probability of one event occurring given that another event has already happened.
    • Marginal Probability Distribution: Probability distribution of a single variable obtained by summing or integrating a joint distribution over the other variables.
    • Characteristic Function: Represents the entire probability distribution through a mathematical formula.
  • Probability Distribution Functions:
    - Discrete Probability Functions:
    - Probability Mass Function (PMF): These functions are used for calculating the probability of discrete values in a given distribution.
    - Cumulative Mass/Density Function(CMF/CDF): It is the cumulative probability calculated over PDF/PMF i.e. summation of all probabilities associated with a comparatively lower value.
    - Continuous Probability Functions:
    - Probability Density Function (PDF): It defines a probability distribution for a continuous random variable as opposed to a discrete random variable(as in PMF). PDF’s value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample.
    - Continuous & Discrete Probability Functions:
    - Cumulative Distribution Function (CDF): Describes the probability distribution of random variables, having values less than or equal to x.


    Cumulative functions sum the total likelihood up to a certain point point.