Control variables in regressions — better don’t report them!

A while ago I wrote a short blog post with a pretty simple message: “Don’t Put Too Much Meaning Into Control Variables”. And I must say I was surprised by the many positive responses it got. The respective tweet received more than 1000 likes and nearly 400 retweets. And the blog post even got mentioned in an internal newsletter by the World Bank. So clearly there seems to be some demand for the topic. That’s why my coauthor Beyers Louw (PhD student at Maastricht University) and I decided to turn it into a citable research note, which is now available on arXiv:

“On the Nuisance of Control Variables in Regression Analysis” 

Abstract: Control variables are included in regression analyses to estimate the causal effect of a treatment variable of interest on an outcome. In this note we argue that control variables are unlikely to have a causal interpretation themselves though. We therefore suggest to refrain from discussing their marginal effects in the results sections of empirical research papers. 

Please use it and save yourself a paragraph or two in your next research paper! :)

Public procurement as a policy instrument for innovation

We have a new paper Public Procurement of Innovation: Evidence from a German Legislative Reform out at IJIO (preprint available without paywall here under “Research”) and I’ve briefly summarized the content in a Twitter thread (apparently that’s were these things happen these days, blogs are so 2012…). For reference, I’ll link to the tweets below:


Causal Inference in Business Practice – Survey

My colleagues and I are currently looking for data scientists to take part in a short survey (5–10 min) on causal inference in business practice. Is data-driven decision making important in your job? Then we’d love to hear your perspective:

Please help us reaching more people by sharing the above link with friends and colleagues, or by retweeting this tweet:

Thank you for your help!

Mapping Unchartered Territory

A frequent point of criticism against Directed Acyclic Graphs is that writing them down for a real-world problem can be a difficult task. There are numerous possible variables to consider and it’s not clear how we can determine all the causal relationships between them. We recently had a Twitter discussion where exactly this argument popped up again.

I’ve written about this problem before, where I argue that DAGs actually don’t have to be that complex, if we look at, for example, the models we work with in structural econoemtrics or economic theory. But Jason Abaluck, professor at the Yale School of Management, brought up an interesting example that might be useful for illustrating what I have in mind.

Here is my reply:

It’s good point that mapping out what we know in a DAG – especially for unchartered territory – can be complex. Related to the specific example of the college wage premium, I would advise a grad student who studies this question to first do a thorough literature review. That’s the basis for synthesizing what we’ve learned in 50 years or so about the topic. The DAG then serves as a perfect tool for organizing this body of knowledge. Now, for some arrows the decision to include or omit them might be ambiguous. But these are exactly the cases where there is a need for future research. A great opportunity for a fresh grad student.

This process is of course quite tedious, but there isn’t really an alternative to it. When we justify the exogeneity of our instruments, we also need to know all possible confounders that might play a role. The same goes for arguing that there is no self-selection around the discontinuity threshold or that common trends hold. We can only justify these assumptions by synthesizing the prior knowledge we have about the subject under study.

The fact that some people think this would be different with potential outcome methods, is because we’ve accepted loose standards for arguing verbally about ignorability, exogeneity and causal mechanism in our papers and seminars. This process is highly non-transparent and prone to arguments by authority.

Going through the entire body of knowledge about a specific problem and casting it into a DAG is cumbersome, I realize. Once we will start to make our assumptions more explicit though, others will be able to build on our work. They can then test the proposed model against the available data or look for experimental evidence for ambigious causal relationships. This process of knowledge curation is not something one paper can achieve alone, it has to be a truly collaborative exercise. I don’t see how we can have real progress in a field without it.

PO vs. DAGs – Comments on Guido Imbens’ New Paper

Guido Imbens published a new working paper in which he develops a detailed comparison of the potential outcomes framework (PO) and directed acyclic graphs (DAG) for causal inference in econometrics. I really appreciate this paper, because it introduces a broader audience in economics to DAGs and highlights the complementarity of both approaches for applied econometric work. Continue reading PO vs. DAGs – Comments on Guido Imbens’ New Paper

Causal Data Science in Business

A while back I was posting about Facebook’s causal inference group and how causal data science tools slowly find their way from academia into business. Since then I came across many more examples of well-known companies investing in their causal inference (CI) capabilities: Microsoft released its DoWhy library for Python, providing CI tools based on Directed Acylic Graphs (DAGs); I recently met people from IBM Research interested in the topic; Zalando is constantly looking for people to join their CI/ML team; and Lufthansa, Uber, and Lyft have research units working on causal AI applications too.

The topic of causal inference seems to be booming at the moment—and for good reasons.

Causal knowledge is crucial for decision-making. Take the example of an advertiser who wants to know how effective her company’s social media marketing campaign on Instagram is. Unfortunately, our current workhorse tools in machine learning are not capable of answering such a question.

A decision tree classifier might give you a very precise estimate that ads which use blue colors and sans-serif fonts are associated with 12% higher click-through rates. But does that mean that every advertising campaign should switch to that combination in order to boost user engagement? Not necessarily. It might just reflect the fact that a majority of Fortune-500 firms—the ones with great products—happen to use blue and sans-serif in their corporate designs.

This is what Judea Pearl—father of causality in artificial intelligence—calls the difference between “seeing” and “doing”. Standard machine learning tools are designed for seeing, observing, discerning patterns. And they’re pretty good at it! But management decisions very often involve “doing”, as long the goal is to manipulate a variable X (e.g., ad design, team diversity, R&D spending, etc.) in order to achieve an effect on another variable Y (click-through rate, creativity, profits, etc.).

In my group we recently won a grant for a research project in which we want to learn more about how this crucial difference affects business practices. In particular, we want to know what kind of questions companies are trying to answer with their data science efforts, and whether these questions require causal knowledge. We also want to understand better whether firms are using appropriate tools for their respective business applications, or whether there’s a need for major retooling in the data science community. After all, there might be important questions that currently remain unanswered, because companies lack the causal inference skills to address them. That’s certainly another issue we would like to explore.

So, if you working in the field of data science and machine learning, and you’re interested in causality, please come talk to us! We would love to hear about your experiences. Slowly but surely, causal inference seems to develop into one of the hottest trends in the tech sector right now, and our goal is to shed more light on this phenomenon with our research.

Don’t Put Too Much Meaning Into Control Variables

I’m currently reading this great paper by Carlos Cinelli and Chad Hazlett: “Making Sense of Sensitivity: Extending Omitted Variable Bias”. They develop a full suite of sensitivity analysis tools for the omitted variable problem in linear regression, which everyone interested in causal inference should have a look at. While kind of a side topic, they make an important point on page 6 (footnote 6):

[…] since the researcher’s goal is to estimate the causal effect of D on Y , usually Z is required only to, along with X, block the back-door paths from D to Y (Pearl 2009), or equivalently, make the treatment assignment conditionally ignorable. In this case, \hat{\gamma} could reflect not only its causal effect on Y , if any, but also other spurious associations not eliminated by standard assumptions.

It’s commonplace in regression analyses to not only interpret the effect of the regressor of interest, D, on an outcome variable, Y, but also to discuss the coefficients of the control variables. Researchers then often use lines such as: “effects of the controls have expected signs”, etc. And it probably happened more than once that authors ran into troubles during peer-review because some regression coefficients where not in line with what reviewers expected.

Cinelli and Hazlett remind us that this is shortsighted, at best, because coefficients of control variables do not necessarily have a structural interpretation. Take the following simple example:280419 If we’re interested in estimating the causal effect of X on Y, P(Y|do(X)), it’s entirely sufficient to adjust¹ for W1 in this graph. That’s because W1 closes all backdoor paths between X and Y, and thus the causal effect can be identified as:

P(Y|do(X)) = \sum_{W_1} P(Y|X, W_1)P(W_1).

However, if we estimate the right-hand side, for example, by linear regression, the coefficient of W1 will not represent its effect on Y. It partly picks up the effect of W2 too, since W1 and W2 are correlated.

If we would also include W2 in the regression, then the coefficients of the control variables could be interpreted structurally and would represent genuine causal effects. But in practice it’s very unlikely that we’ll be able to measure all causal parents of Y. The data collection efforts could just be too huge in a real-world situation.

Luckily, that’s not necessary, however. We only need to make sure that the treatment variable X is unconfounded or conditionally ignorable. And a smaller set of control variables could do the job just fine. But that also implies that the coefficients of controls lose their substantive meaning, because they now represent a complicated weighting of several causal influence factors. Therefore, it doesn’t make much sense to try to put them into context. And if they don’t have expected signs, that’s not a problem.


¹ The term control variable is actually a bit of an outdated terminology, because W1 isn’t controlled in the sense of an intervention. It’s rather adjusted for or conditioned on in terms of taking conditional probabilities. But since the term is so ubiquitous, I’ll use it here too.