You Can’t Test Instrument Validity

Instrumental variable (IV) estimation is an important technique in causal inference and applied empirical work. The canonical IV setting looks like the following:


Here, the relationship between X and Y is confounded by unobservable influence factors (denoted by the dashed bidirected arrow). Therefore we cannot estimate the causal effect of X on Y by a simple regression. But since the instrument Z induces variation in X that is unrelated to the unobserved confounders, we can use Z as an auxiliary experiment that allows us to identify the so-called local average treatment effect (or LATE) of X on Y.¹

For this to work it’s crucial that Z doesn’t directly affect Y (i.e., no arrow from Z to Y). Moreover, there shouldn’t be any unobservable confounders (i.e., other dashed bidirected arcs) between Z and Y, otherwise the identification argument breaks down. These two assumptions need to be justified purely based on theoretical reasonings and cannot be tested with the help of data.

Unfortunately, however, you will frequently come across people who don’t accept that the assumption of instrument validity isn’t testable. Usually, these folks then ask you to do one of the following two things in order to convince them:

  1. Show that Z is uncorrelated with Y (conditional on the other control variables in your study), or;
  2. Show that Z is uncorrelated with Y when adjusting for X (again, conditional on the other controls).

Both of these requests are wrong. The first one is particularly moronic. In order to not run into a weak instruments problem we want that Z exerts a strong influence on X. If X also affects Y, there will be a correlation between Z and Y by construction, through the causal chain Z \rightarrow X \rightarrow Y.

The second request is likewise mistaken, because adjusting for X doesn’t d-separate Z and Y. On the contrary, as X is a collider on Z \rightarrow X \dashleftarrow \dashrightarrow Y, conditioning on X opens up the path and thus creates a correlation between Z and Y.²

So both “tests” won’t tell you anything about whether the causal structure in the graph above is correct. Z and Y can be significantly correlated (also condional on X) even though the instrument is perfectly valid. These tests have no discriminating power whatsoever. Instead, all you can do is argue on theoretical grounds that the IV assumptions are fulfilled.

In general, there is no such thing as purely data-driven causal inference. At one point, you will always have to rely on untestable assumptions that need to be substantiated by expert knowledge about the empirical setting at hand. Causal graphs are of great help here though, because they make these assumptions super transparent and tractable. I see way too many people — all across the ranks — who are confused about the untestability of IV assumptions. If we would teach them causal graph methodology more thoroughly, I’m sure this would be less of a problem.


¹ Identification of the LATE additionally requires that the effect of Z on X is monotone. If you want to know more about these and other details of IV estimation, you can have a look at my lecture notes on causal inference here.

² I explain the terms d-separation and colliders both here and here (latter source is more technical)

Sample Selection Vs. Selection Into Treatment

This is an issue that bothered me for quite some time. So I finally decided to settle it with a blog post. I see people constantly confusing the two most common threats to causal inference—sample selection and endogeneity. This happens, for example, quite often in management research, where it is common to recommend a sample selection model in order to deal with endogenous treatments. But the two concepts are far from being equivalent. Have a look at the following graph, which describes a typical case of endogeneity. Continue reading Sample Selection Vs. Selection Into Treatment

Why you shouldn’t control for post-treatment variables in your regression

This is a slight variation of a theme, I was already blogging about some time ago. But I recently had a discussion with a colleague and thought it would be worthwhile to share my notes here. So what might go wrong if you control for post-treatment variables in your statistical model? Continue reading Why you shouldn’t control for post-treatment variables in your regression

Microsoft Releases New Python Library for Causal Inference

A while ago I blogged about Facebook’s causal inference group. Now Microsoft has followed suit and released a Python library for graph-based methods of causal inference. Continue reading Microsoft Releases New Python Library for Causal Inference

The Origins of Graphical Causal Models

Here is an interesting bit of intellectual history. In his 2000 book “Causality”, Judea Pearl describes how he got to the initial idea that sparked the development of causal inference based on directed acyclic graphs. Continue reading The Origins of Graphical Causal Models

No Free Lunch in Causal Inference

Last week I was teaching about graphical models of causation at a summer school in Montenegro. You can find my slides and accompanying R code in the teaching section of this page. It was lots of fun and I got great feedback from students. After the workshop we had stimulating discussions about the usefulness of this new approach to causal inference in economics and business. I’d like to pick up one of those points here, as this is an argument I frequently hear when talking to people with a classical econometrics training. Continue reading No Free Lunch in Causal Inference