The Spearman Rank Correlation Coefficient can be used to calculate the degree of correlation between two qualitative variables. The given data must be ordered in the form of “ranks” which can be used to calculate the extent of the correlation between the two variables.
The coefficient of correlation can then be calculated using the formula,
R = 1 – 6∑d2/n(n2-1).
- d = the difference between the ranks of the two variables.
- n = a number of data points given.
We now list out some of the merits and demerits of the Spearman Rank Correlation Method.
Advantages of Spearman Rank Correlation Coefficient:
- This method is much simpler to carry out and understand compared to Karl Pearson’s method of correlation. It gives the same result as the Karl Pearson method if none of the values/ranks are repeated.
- This method can be used to carry out correlation analysis for variables that are not numerical. We can study the relationships between qualitative variables such as beauty, intelligence, honesty, efficiency, and so on.
- Karl Pearson’s correlation coefficient assumes that the parent population from which sample observations are drawn is normal. If this assumption is violated then we need a measure that is distribution-free (or non-parametric). A distribution-free measure is one that does not make any assumptions about the parameters of the population. Spearman’s ρ is such a measure (i.e., distribution-free) since no strict assumptions are made about the form of the population from which the sample observations are drawn.
- Spearman’s formula is the only formula to be used for finding the correlation coefficient if we are dealing with qualitative characteristics which cannot be measured quantitatively but can be arranged serially.
- It can also be used when actual numerical data is given. The sample data of values of two variables are converted into ranks either in ascending order or descending order for calculating the degree of correlation between the two variables.
Disadvantages of Spearman Rank Correlation Coefficient:
- Values of both variables are assumed to be describing a linear relationship rather than a non-linear relationship.
- A large computational time is required when the number of pairs of values of two variables exceeds 30. In such cases, assigning ranks to each of the numerical values is a very time-consuming and tedious process.
- This method cannot be applied to measure the association between two variables whose distribution is given in the form of a grouped frequency distribution.