If we have two variables X and Y that are correlated with each other then we can determine the extent of the correlation by calculating the Karl Pearsons Correlation Coefficient.
But the Karl Pearson correlation coefficient reflects the degree of correlation between the two variables, only under certain assumptions. We now list the assumptions behind the Karl Pearson Coefficient of Correlation.
- The very first assumption is that there is a linear relationship between the two variables.
- This means that if we were to plot the relationship between the two variables on a scatter diagram we would obtain a straight line between them.
- If one of the variables happens to be exponentially related to the other then the correlation coefficient is not applicable in such situations.
- The next assumption is the assumption of normality.
- Each of the variables (series) is affected by a large number of independent contributory causes of such a nature as to produce normal distribution. For example, the variables (series) relating to ages, heights, weights, supply, price, etc., conform to this assumption.
- In the words of Karl Pearson: “The sizes of the complex organs (something measurable) are determined by a great variety of independent contributing causes, for example, climate, nourishment, physical training and innumerable other causes which cannot be individually observed or their effects measured.”
- Karl Pearson further observes, “The variations in the intensity of the contributory causes are small as compared with their absolute intensity and these variations follow the normal law of distribution.”
- The variables are measured in the interval or ratio scale.
- We cannot calculate the correlation coefficient if one of the data series consists of nominal data.
- In case both the series consist of ranks (ordinal data) then we can calculate the correlation using Spearman’s Rank Correlation instead of Karl Pearsons Correlation Coefficient.
- Another important assumption is that we are given an equal number of observations in both series which come in pairs.
- For example, if we are given 10 values for the first variable and only 9 values for the second variable then it is impossible to calculate the correlation coefficient.
- The data should not contain any extreme values.
- This is because the presence of extreme values greatly inflates the values of the variance.
- This makes the denominator of the correlation coefficient very large which in turn causes the magnitude of the correlation coefficient to become very small.