We say that a statistical quantity is resistant to outliers if it does not change drastically due to the presence of extreme values.
The mean is not resistant to outliers, that is, the value of the mean changes drastically in the presence of extremely high or extremely low values.
On the other hand, the median does not change drastically due to extreme values, that is, it is resistant to outliers.
This is why the median is much more accurate compared to the median as a measure of the central tendency of the data. In this article, we will explain why the median is resistant to outliers but the median is not.
The main reason that the median is resistant while the mean is not is because of the way these two quantities are calculated. Recall that the mean is calculated by adding up all the observations and dividing by the number of observations.
Since one is taking all observations into consideration (including extremely high or low values) into consideration when calculating the mean, we conclude that it will be affected by all observations (including extreme values).
On the other hand, recall that the median is calculated by arranging the data in ascending order and then taking the middlemost value.
Since we are only taking the middlemost value we see that the lowest and highest values play no role when calculating the median.
Since the calculation of the value of the median does not depend on the presence of extreme values we conclude that it is resistant to outliers. Let us try to understand the above explanation by looking at a simple example.
Consider the following set of five data values: 11, 15, 17, 18, and 19. We first calculate their mean and median as follows,
Mean = 11+ 15+ 17+ 18 + 19/5 = 80/5 = 16.
Median = Middlemost value = 3rd term = 17.
Now suppose that we replace one of the values, say 19, with an extremely large value such as 99. We can then observe that the mean becomes,
Mean = 11+ 15+ 17+ 18 + 99/5 = 160/5 = 32.
So we see that a single extreme value causes a large change in the mean (almost doubling it). This happened because we have to sum up all the values when calculating the mean. This explains why the mean is not resistant to outliers.
On the other hand for the new data set: 11, 15, 17, 18, 99 we see that the median is still equal to 17. This is because the number 99 makes no difference when calculating the median.
Since the middle value remains unchanged the median also remains unchanged. Since we do not consider all observations when calculating the median it remains resistant to outliers.