The jaccard index measures the commonality between two sets. Its value always lies between 0 and 1. A higher value of the Jaccard index means that there is a greater similarity between the two sets.
Formula for the Jaccard Index
Given two sets A and B, the formula for the Jaccard Index is,
How to calculate the Jaccard index?
- Given two sets A and B, count the number of points which lie in both sets.
- Count the total number of points which lie in either of the two sets.
- Divide the two numbers as shown in the above formula to obtain the value of the index.
- Multiply the result by 100% if you want to express the similarity in the form of a percentage.
Interpretation of the Jaccard Index:
- If the Jaccard index is closer to 1, it means that the two sets have more similarity.
- If the index is closer to o it means the two sets are dissimilar.
- The Jaccard index is equal to 0 means that the two sets are completely disjoint.
Example:
We are given two sets of people – those who like to drink tea and those who like to drink coffee (the sets being denoted as A and B respectively).
If 30 people like tea and 40 people like coffee and 10 people like both. Find the Jaccard index for the two sets.
Solution: Given, n(A)=30 , n(B)=40 and n(A∩B)=10
By the addition rule, n(AUB)= n(A)+ n(B)- n(A∩B) = 30+40-10 = 60
Substituting all this in the above formula we get,
J(A,B) = n(A∩B)/ n(AUB) = 10/60 = 0.1667 or 16.67%