You Cannot Test the Exclusion Restriction

Notes

Exclusion Restriction

Instrumental Variables

One Example

Author

C. Luke Watson

Published

March 17, 2025

You Cannot Test the Exclusion Restriction

We all know that the exclusion restriction cannot really be tested. However, it seems very tempting to try.

Suppose we have the following setup:

True Structural equation: \(y = \beta x + u\)
Endogeneity: \(Cov(x,u) = \rho\) (non-zero)
First stage: \(x = \pi z + e\)
\(z\) is exogenous: \(Cov(z,u) = 0\) (instrument validity)

What about regressing \(y\) on \(x\) and \(z\) and then doing a hypothesis test of \(z\)’s coefficient? Specify the alternative model (augmented structural equation): \(y = \beta x + \gamma z + u\).

Upshot: the coefficient in this case equals:

\[ \begin{equation} \hat{\gamma} = \gamma - \frac{\rho \pi}{\sigma_e^2} \end{equation} \] where \(\sigma^2_z = Var(z)\), \(\sigma^2_e = Var(e)\).

Even if \(\gamma=0\), \(\hat{\gamma}\) is not zero unless (1) \(\rho=0\) (no endogeneity) or (2) \(\pi=0\) (no relevance). If \(\gamma\ne 0\), then knife-edge cases can occur.

Derivation

Direct Substitution Approach

Let’s use a direct substitution approach.

We estimate the equation:

\[y = \theta_1 x + \theta_2 z + v\] Remember, this is a regression we are running, not the true model that we specified above.

The OLS estimators \(\hat{\theta}_1\) and \(\hat{\theta}_2\) solve the normal equations:

\[ \begin{bmatrix} \sum x_i^2 & \sum x_i z_i \\ \sum x_i z_i & \sum z_i^2 \end{bmatrix} \begin{bmatrix} \hat{\theta}_1 \\ \hat{\theta}_2 \end{bmatrix} = \begin{bmatrix} \sum x_i y_i \\ \sum z_i y_i \end{bmatrix} \]

In large samples, these converge to:

\[ \begin{bmatrix} E[x^2] & E[xz] \\ E[xz] & E[z^2] \end{bmatrix} \begin{bmatrix} \text{plim}(\hat{\theta}_1) \\ \text{plim}(\hat{\theta}_2) \end{bmatrix} = \begin{bmatrix} E[xy] \\ E[zy] \end{bmatrix} \]

Let’s substitute our model equations. Recall: \(x = \pi z + e\) (first stage) and \(y = \beta x + \gamma z + u\) (augmented structural equation).

Now let’s compute the relevant expectations:

\(E[xz]\)

\[ \begin{align} E[xz] &= E[(\pi z + e)z] \\ &= \pi E[z^2] + E[ez] \\ &= \pi \sigma_z^2 \end{align} \]

where \(\sigma_z^2 = E[z^2]\) (assuming \(E[z] = 0\) for simplicity and \(E[ez] = 0\) by construction)

\(E[x^2]\)

\[ \begin{align} E[x^2] &= E[(\pi z + e)^2] \\ &= \pi^2 E[z^2] + 2\pi E[ez] + E[e^2] \\ &= \pi^2 \sigma_z^2 + \sigma_e^2 \end{align} \]

where \(\sigma_e^2 = E[e^2]\)

\(E[xy]\)

\[ \begin{align} E[xy] &= E[(\pi z + e)(\beta x + \gamma z + u)] \\ &= \beta E[x^2] + \gamma E[xz] + E[xu] \\ &= \beta (\pi^2 \sigma_z^2 + \sigma_e^2) + \gamma \pi \sigma_z^2 + \rho \end{align} \]

\(E[zy]\)

\[ \begin{align} E[zy] &= E[z(\beta x + \gamma z + u)] \\ &= \beta E[zx] + \gamma E[z^2] + E[zu] \\ &= \beta \pi \sigma_z^2 + \gamma \sigma_z^2 + 0 \end{align} \]

(as \(E[zu] = 0\) by the exclusion restriction assumption)

Now we can solve the system of equations:

\[ \begin{bmatrix} \pi^2 \sigma_z^2 + \sigma_e^2 & \pi \sigma_z^2 \\ \pi \sigma_z^2 & \sigma_z^2 \end{bmatrix} \begin{bmatrix} \text{plim}(\hat{\theta}_1) \\ \text{plim}(\hat{\theta}_2) \end{bmatrix} = \begin{bmatrix} \beta (\pi^2 \sigma_z^2 + \sigma_e^2) + \gamma \pi \sigma_z^2 + \rho \\ \beta \pi \sigma_z^2 + \gamma \sigma_z^2 \end{bmatrix} \]

To solve for \(\text{plim}(\hat{\theta}_1)\) and \(\text{plim}(\hat{\theta}_2)\), use Cramer’s rule. First, find the determinant of the coefficient matrix:

\[ \begin{align} \det &= (\pi^2\sigma^2_z + \sigma^2_e)(\sigma^2_z) - (\pi\sigma^2_z)(\pi\sigma^2_z) \\ &= \pi^2\sigma^4_z + \sigma^2_e\sigma^2_z - \pi^2\sigma^4_z \\ &= \sigma^2_e\sigma^2_z \end{align} \]

Next, to solve for \(\text{plim}(\hat{\theta}_1)\), replace the first column of the coefficient matrix with the right-hand side: \[ \begin{bmatrix} \beta(\pi^2\sigma^2_z + \sigma^2_e) + \gamma\pi\sigma^2_z + \rho & \pi\sigma^2_z \\ \beta\pi\sigma^2_z + \gamma\sigma^2_z & \sigma^2_z \end{bmatrix} \]

Thus, the determinant of this matrix is: \[ \begin{align} &[\beta(\pi^2\sigma^2_z + \sigma^2_e) + \gamma\pi\sigma^2_z + \rho](\sigma^2_z) - (\pi\sigma^2_z)(\beta\pi\sigma^2_z + \gamma\sigma^2_z) \\ &= \beta(\pi^2\sigma^2_z + \sigma^2_e)\sigma^2_z + \gamma\pi\sigma^4_z + \rho\sigma^2_z - \pi^2\beta\sigma^4_z - \pi\gamma\sigma^4_z \\ &= \beta\pi^2\sigma^4_z + \beta\sigma^2_e\sigma^2_z + \gamma\pi\sigma^4_z + \rho\sigma^2_z - \pi^2\beta\sigma^4_z - \pi\gamma\sigma^4_z \\ &= \beta\sigma^2_e\sigma^2_z + \rho\sigma^2_z \end{align} \]

Therefore: \[ \text{plim}(\hat{\theta}_1) = \frac{\beta\sigma^2_e\sigma^2_z + \rho\sigma^2_z}{\sigma^2_e\sigma^2_z} = \beta + \frac{\rho}{\sigma^2_e} \]

Similarly, to solve for \(\text{plim}(\hat{\theta}_2)\), replace the second column of the original coefficient matrix with the right-hand side: \[ \begin{bmatrix} \pi^2\sigma^2_z + \sigma^2_e & \beta(\pi^2\sigma^2_z + \sigma^2_e) + \gamma\pi\sigma^2_z + \rho \\ \pi\sigma^2_z & \beta\pi\sigma^2_z + \gamma\sigma^2_z \end{bmatrix} \]

The determinant of this matrix is: \[ \begin{align} &(\pi^2\sigma^2_z + \sigma^2_e)(\beta\pi\sigma^2_z + \gamma\sigma^2_z) - (\pi\sigma^2_z)[\beta(\pi^2\sigma^2_z + \sigma^2_e) + \gamma\pi\sigma^2_z + \rho] \\ &= (\pi^2\sigma^2_z + \sigma^2_e)(\beta\pi\sigma^2_z) + (\pi^2\sigma^2_z + \sigma^2_e)(\gamma\sigma^2_z) - (\pi\sigma^2_z)(\beta\pi^2\sigma^2_z) - (\pi\sigma^2_z)(\beta\sigma^2_e) - (\pi\sigma^2_z)(\gamma\pi\sigma^2_z) - (\pi\sigma^2_z)(\rho) \\ &= \beta\pi^3\sigma^4_z + \beta\pi\sigma^2_e\sigma^2_z + \gamma\pi^2\sigma^4_z + \gamma\sigma^2_e\sigma^2_z - \beta\pi^3\sigma^4_z - \beta\pi\sigma^2_e\sigma^2_z - \gamma\pi^2\sigma^4_z - \rho\pi\sigma^2_z \\ &= \gamma\sigma^2_e\sigma^2_z - \rho\pi\sigma^2_z \end{align} \]

Therefore: \[\text{plim}(\hat{\theta}_2) = \frac{\gamma\sigma^2_e\sigma^2_z - \rho\pi\sigma^2_z}{\sigma^2_e\sigma^2_z} = \gamma - \frac{\rho\pi}{\sigma^2_e}\]

In summary: \[\text{plim}(\hat{\theta}_1) = \beta + \frac{\rho}{\sigma^2_e}\] \[\text{plim}(\hat{\theta}_2) = \gamma - \frac{\rho\pi}{\sigma^2_e}\]

Thus, when we include \(z\) structural equation (some abuse of notation here), then the coefficient on the excluded instrument equals the true direct effect minus a bias term. Even when the exclusion restriction holds (\(\gamma = 0\)), we would estimate \(\hat{\gamma}_2 \neq 0\) as long as there is endogeneity (\(\rho \neq 0\)) and the instrument is relevant (\(\pi \neq 0\)). This proves that including the instrument in the structural equation is not a valid test of the exclusion restriction.

Note as well that the coefficient on the endogenous variable of interest is also inconsistent.