```
tags:
- Data-Science
```

- hash function: any function that can be used to map data of arbitrary size to data of fixed size. One use is a data structure called a hash table, widely used in computer software for rapid data lookup. Hash functions accelerate table or database lookup by detecting duplicated records in a large file.
- O(n): big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows. In analytic number theory, big O notation is often used to express a bound on the difference between an arithmetical function and a better understood approximation.
- Model Selection Techniques:
- Probabilistic Measures: Scoring by performance and complexity of model.
- Resampling Methods: Splitting in sub-train and sub-test datasets and scoring by mean values of repeated runs.

Techniques:

- Resampling:
- Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping)
- Exchanging labels on data points when performing significance tests (permutation tests, also called exact tests, randomization tests, or re-randomization tests)
- Validating models by using random subsets (bootstrapping, cross-validation)

- Shrinkage
- In relation to the general observation that, in regression analysis, a fitted relationship appears to perform less well on a new data set than on the data set used for fitting. In particular the value of the coefficient of determination ‘shrinks’.
- To describe general types of estimators, or the effects of some types of estimation, whereby a naive or raw estimate is improved by combining it with other information (see shrinkage estimator).

- Dimension Reduction(Dimensionality Reduction): the process of reducing the number of random variables under consideration by obtaining a set of principal variables.
- Data Augmentation: a technique used to increase the amount of data by generating data using transformations such as rotation, scaling, or flipping to existing data.
- Artifact: Artifact refers to an intermediate result in the data science development process. In the data science workflow, an artifact can be a model, a chart, a statistic, a dataframe, or a Feature function.
- Data Pipeline: A pipeline refers to a series of steps that transform data into useful information/product and is often automated.

Interactive Graph

Table Of Contents