Correlation tells us about the relationship between two variables and indicates the degree and direction of their association, but fails to answer the following question. Is there any functional (or algebraic) relationship between two variables? If yes, can it be used to estimate the most likely value of one variable, given the value of the other variable?

The statistical technique that expresses the relationship between two or more variables in the form of an equation to estimate the value of a variable, based on the given value of another variable, is called regression analysis.

The variable whose value is estimated using the algebraic equation is called the dependent (or response) variable and the variable whose value is used to estimate this value is called the independent (regressor or predictor) variable. The linear algebraic equation used for expressing the dependent variable in terms of the independent variable is called linear regression equation. We now give the main differences between correlation and regression analysis.

**Difference between Correlation and Regression:**

- Developing an algebraic equation between two variables from sample data and predicting the value of one variable, given the value of the other variable, is referred to as regression analysis while measuring the strength (or degree) of the relationship between two variables is referred to as correlation analysis. The sign of the correlation coefficient indicates the nature (direct or inverse) of the relationship between two variables, while the absolute value of the correlation coefficient indicates the extent of the relationship.
- Correlation analysis determines an association between two variables x and y but not that they have a cause-and-effect relationship. Regression analysis, in contrast to correlation, determines the cause-and-effect relationship between x and y, that is, a change in the value of independent variable x causes a corresponding change (effect) in the value of dependent variable y if all other factors that affect y remain unchanged.
- In linear regression analysis, one variable is considered a dependent variable and another the independent variable, while in correlation analysis both variables are considered to be independent.
- The coefficient of determination r
^{2}indicates the proportion of total variance in the dependent variable that is explained or accounted for by the variation in the independent variable. Since the value of r^{2}is determined from a sample, its value is subject to sampling error. Even if the value of r^{2}is high, the assumption of a linear regression may be incorrect because it may represent a portion of the relationship that actually is in the form of a curve.

*References:*

Fundamentals of Business Statistics – JK Sharma