No menu items!

Assumptions behind Correlation Coefficient


If we have two variables X and Y that are correlated with each other then we can determine the extent of the correlation by calculating the Karl Pearsons Correlation Coefficient. The Karl Pearsons Correlation Coefficient can be calculated using the formula, r = \frac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum (x_i-\bar{x})^2}\sqrt{\sum (y_i-\bar{y})^2}}.

But the Karl Pearson correlation coefficient reflects the degree of correlation between the two variables, only under certain assumptions. We now list the assumptions behind the Karl Pearson Coefficient of Correlation.


  1. The very first assumption is that there is a linear relationship between the two variables. This means that if we were to plot the relationship between the two variables on a scatter diagram we would obtain a straight line between them. If one of the variables happens to be exponentially related to the other then the correlation coefficient is not applicable in such situations.
  2. This assumption is the assumption of normality. Each of the variables (series) is being affected by a large number of independent contributory causes of such a nature as to produce normal distribution. For example, the variables (series) relating to ages, heights, weights, supply, price, etc., conform to this assumption. In the words of Karl Pearson: “The sizes of the complex organs (something measurable) are determined by a great variety of independent contributing causes, for example, climate, nourishment, physical training and innumerable other causes which cannot be individually observed or their effects measured.” Karl Pearson further observes, “The variations in the intensity of the contributory causes are small as compared with their absolute intensity and these variations follow the normal law of distribution.”
  3. The variables are measured in the interval or ratio scale. We cannot calculate the correlation coefficient if one of the data series consists of nominal data. In case both the series consist of ranks (ordinal data) then we can calculate the correlation using Spearman’s Rank Correlation instead of Karl Pearsons Correlation Coefficient.
  4. Another important assumption is that we are given an equal number of observations in both series which come in pairs. For example, if we are given 10 values for the first variable and only 9 values for the second variable then it is impossible to calculate the correlation coefficient.
  5. The data should not contain any extreme values. This is because the presence of extreme values greatly inflates the values of the variance. This makes the denominator of the correlation coefficient very large which in turn causes the magnitude of the correlation coefficient to become very small.

Hey 👋

I'm currently pursuing a Ph.D. in Maths. Prior to this, I completed my master's in Maths & bachelors in Statistics.

I created this website for explaining maths and statistics concepts in the simplest possible manner.

If you've found value from reading my content, feel free to support me in even the smallest way you can.

Share this article

Recent posts

Popular categories

Recent comments