You Can’t Test Instrument Validity

Instrumental variable (IV) estimation is an important technique in causal inference and applied empirical work. The canonical IV setting looks like the following:


Here, the relationship between X and Y is confounded by unobservable influence factors (denoted by the dashed bidirected arrow). Therefore we cannot estimate the causal effect of X on Y by a simple regression. But since the instrument Z induces variation in X that is unrelated to the unobserved confounders, we can use Z as an auxiliary experiment that allows us to identify the so-called local average treatment effect (or LATE) of X on Y.¹

For this to work it’s crucial that Z doesn’t directly affect Y (i.e., no arrow from Z to Y). Moreover, there shouldn’t be any unobservable confounders (i.e., other dashed bidirected arcs) between Z and Y, otherwise the identification argument breaks down. These two assumptions need to be justified purely based on theoretical reasonings and cannot be tested with the help of data.

Unfortunately, however, you will frequently come across people who don’t accept that the assumption of instrument validity isn’t testable. Usually, these folks then ask you to do one of the following two things in order to convince them:

  1. Show that Z is uncorrelated with Y (conditional on the other control variables in your study), or;
  2. Show that Z is uncorrelated with Y when adjusting for X (again, conditional on the other controls).

Both of these requests are wrong. The first one is particularly moronic. In order to not run into a weak instruments problem we want that Z exerts a strong influence on X. If X also affects Y, there will be a correlation between Z and Y by construction, through the causal chain Z \rightarrow X \rightarrow Y.

The second request is likewise mistaken, because adjusting for X doesn’t d-separate Z and Y. On the contrary, as X is a collider on Z \rightarrow X \dashleftarrow \dashrightarrow Y, conditioning on X opens up the path and thus creates a correlation between Z and Y.²

So both “tests” won’t tell you anything about whether the causal structure in the graph above is correct. Z and Y can be significantly correlated (also condional on X) even though the instrument is perfectly valid. These tests have no discriminating power whatsoever. Instead, all you can do is argue on theoretical grounds that the IV assumptions are fulfilled.

In general, there is no such thing as purely data-driven causal inference. At one point, you will always have to rely on untestable assumptions that need to be substantiated by expert knowledge about the empirical setting at hand. Causal graphs are of great help here though, because they make these assumptions super transparent and tractable. I see way too many people — all across the ranks — who are confused about the untestability of IV assumptions. If we would teach them causal graph methodology more thoroughly, I’m sure this would be less of a problem.


¹ Identification of the LATE additionally requires that the effect of Z on X is monotone. If you want to know more about these and other details of IV estimation, you can have a look at my lecture notes on causal inference here.

² I explain the terms d-separation and colliders both here and here (latter source is more technical)

Sample Selection Vs. Selection Into Treatment

This is an issue that bothered me for quite some time. So I finally decided to settle it with a blog post. I see people constantly confusing the two most common threats to causal inference—sample selection and endogeneity. This happens, for example, quite often in management research, where it is common to recommend a sample selection model in order to deal with endogenous treatments. But the two concepts are far from being equivalent. Have a look at the following graph, which describes a typical case of endogeneity. Continue reading Sample Selection Vs. Selection Into Treatment

Econometrics and the “not invented here” syndrome: suggestive evidence from the causal graph literature

[This post requires some knowledge of directed acyclic graphs (DAG) and causal inference. Providing an introduction to the topic goes beyond the scope of this blog though. But you can have a look at a recent paper of mine in which I describe this method in more detail.]

Graphical models of causation, most notably associated with the name of computer scientist Judea Pearl, received a lot of pushback from the grandees of econometrics. Heckman had his famous debate with Pearl, arguing that economics looks back on its own tradition of causal inference, going back to Haavelmo, and that we don’t need DAGs. Continue reading Econometrics and the “not invented here” syndrome: suggestive evidence from the causal graph literature