Given a set of data values, it is natural to divide the data into groups sharing some similarities. This makes the study of the data much more easier and convenient. For example, biologists classify all living organisms into five kingdoms – Animals, Plants, Fungi, Protozoans, and Monera. Then there are further subdivisions and subclasses until we reach the species of the organism.
Similarly in Chemistry, the elements are classified and presented in simplified form in the periodic table. So we conclude that classification is a ubiquitous method used in the sciences. This is also true in the fields of Statistics and Data Analysis. The data is usually classified on the basis of some characteristics and presented in tabular or visual form.
Classification of data refers to the subdivision of the data into parts on the basis of certain shared characteristics. For example, the number of students enrolled in a particular university can be classified on the basis of the following characteristics:
- Academic Scores.
- Economic Background.
- Course of Study
Classifying data on the basis of these characteristics might reveal for instance, whether the economic backgrounds of the students have any effect on their academic scores or not. Thus we see that classification is very important to get a full picture of the data. We now list some of the main purposes of data classification.
Purpose of Data Classification:
1. Condensing the Data
Dividing the data into different groups allows us to present the data in a much more compact and simplified form. For instance, data becomes much easier to grasp and understand if it is presented in tabular form rather than simply a sequence of numerical values.
2. Enables Comparisons
Classifying the data allows us to make comparisons on the basis of the criteria used for classification. For example, if we are given the data about the gender composition of students at a university enrolled in different courses we can compare the number of males and females in different courses of study (Liberal Arts vs The Sciences).
3. Detecting Casual Relationships
We can make judgments about whether one variable is affected by another variable in the classification. For example, it is well known that scores obtained in standardized academic testing are correlated with the economic background of the students.
4. Allows Further Statistical Analysis
We can conduct different statistical tests to carry out further analysis of the classified data. For example, suppose that we are given the average scores of students enrolled in a particular course. Suppose that half of the students are taught using a particular teaching method and the other half are taught using a different teaching method. We can then carry out a Z test to determine whether the effect due to the teaching method on exam scores is significant on not.