Instrumental variables in statistics are variables that we use during linear regression analysis to account for the presence of correlation between two variables in our regression model. They are also used to deal with the error introduced by unknown confounding variables.

Let us try to understand how instrumental variables arise. Consider the linear regression model,

Y = b_{0}+b_{1}X + e , where e is the error.

Suppose we know that X affects the Y indirectly through another variable. This third variable affected by X and which in turn affects Y is called the instrumental variable. If we do not explicitly introduce this variable in the above model then it will affect the error term and violate the assumption that the error term is normally distributed.

For example, we let Y= Marks obtained in Exam and X= Noise levels in the house. Notice that X does not directly affect Y. We introduce a new instrumental variable Z= Concentration levels during study time. Notice that X( Noise levels in the house) affects Z( Concentration levels during study time) which in turn has an effect on Y( Marks obtained in Exam). This casual chain must be accounted for in our regression model by introducing the new variable Z.

We must introduce instrumental variables in our model and make sure that there is no correlation between the variables in our model and the error term. There is no standard recipe to follow if we want to know whether instrumental variables are present or not. We have to use our knowledge of the causes that go into our model and try to see whether the introduction of an instrumental variable is required.