Jaccard index measures the commonality between two sets. Its value always lies between 0 and 1. A higher value of the Jaccard index means that there is a greater similarity between the two sets.
Given two sets A and B, the formula for the Jaccard Index is,
How to calculate the Jaccard index?
- Given two sets A and B, count the number of point which lie in both sets.
- Count the total number of points which lie in either of the two sets.
- Divide the two numbers as shown in above formula to obtain the value of the index.
- Multiply the result by 100% if you want to express the similarity in the form of a percentage.
Interpretation of the Jaccard Index:
If the Jaccard index is closer to 1, it means that the two sets have more similarity. If the index is closer to o it means the two sets are dissimilar. The Jaccard index being equal to 0 means that the two sets are completely disjoint.
Example: We are given two sets of people – those who like to drink tea and those who like to drink coffee (the sets being denoted as A and B respectively). If 30 people like tea and 40 people like coffee and 10 people like both. Find the Jaccard index for the two sets.
Solution: Given, n(A)=30 , n(B)=40 and n(A∩B)=10
By the addition rule, n(AUB)= n(A)+ n(B)- n(A∩B) = 30+40-10 = 60
Substituting all this in the above formula we get,
J(A,B) = n(A∩B)/ n(AUB) = 10/60 = 0.1667 or 16.67%