tags: [AI/Tasks/Regression, AI/ML/SupervisedLearning, AI/Regression ]
aliases: [Linear Regression Algorithm]

Linear Regression

Linear regression performs the task to predict a dependent variable value (a response variable called ) based on a given independent variable (predictor variable called ). So, this regression technique finds out a Linear Relationship(correlation) between (input) and (output).

Training Linear Regression:
If hypothesis function for Linear Regression is :

: independent variable, input training data
: dependent variable, Prediction value
: as weight(coefficient of x), the slope of the line in linear graph
: as bias(intercept variable)

Training task requires finding the best fit line to predict the value of for a given value of and values. By achieving the best-fit regression line, the model aims to predict value such that the error difference between predicted value and true value is minimum. To do so, we need to find the best value that minimize the error. this is done using Cost Function, namely “sum of squared error”.

Concepts:

Intercept: Point where the regression line crosses the y axis.
Slope: The inclination of the regression line.
Extrapolation: estimated regression equation to estimate a mean() or to predict a new response() for x values.
- Extrapolation beyond the scope of the model(range of the sample data) is considered dangerous because the estimated regression equation often doesn’t provide accurate or even meaningful output outside the scope of the model.
Multicollinearity: when two or more variables have very similar variance, so they behave the same in those terms.
- Multicollinear variables create similar variances causing redundancy in the model and making it less reliable.
- Correlation tests are used to identify Multicollinearity and this types of variable are removed from the data model.
Residuals: The residual(also called Error Term, ) is the difference between the predicted value() and the observed value() and is a measure of assumptions in regression. it’s calculated as .
Residuals must form a normal distribution and features should be correlated.
Multiple Linear Regression: It is a regression model with two or more independent variables and one dependent variable.

Notes:

In solving Linear Regression problems, Singular Value Decomposition (SVD) or QR Decomposition are used to find inverse of matrix. Alternatively Gradient Descent can be used.
Linear regression is highly sensitive to errors and prone to Overfitting. in that case regression techniques such as Ridge Regression, Lasso Regression, or Elastic-Net are often used.

Assumptions in linear regression:

Linear Relationship Exists & all the features are multivariate normally
Homoscedasticity Exists
Little to no Autocorrelation
Little to no Multicollinearity since it can be difficult to separate out the individual effects of collinear features on the target variable.
Normal Distribution of Errors:

Advantages:

It perform very well on linearly separable data.
It's is easy to understand and visualize.
It's very fast.

Disadvantages:

The data can have complex relationships that are not easy to capture in a linear model.
It's prone to Overfitting.