# Blog

## Causal Inference is More than Fitting the Data Well

This post first appeared on February 1, 2021, on causalscience.org.

Causal inference is becoming an increasingly important topic in industry. Several big players have already taken notice and started to invest in the causal data science skills of their people. One piece of evidence surely was the huge success of the first Causal Data Science Meeting last year. Our own research further proves this point. Over the course of last year, we have talked to many data scientists working in the tech sector, as well as related industries, and all of them reported to us that interest in causality and frustration about the limits of classical machine learning are rising. Especially when you tackle complex problems that are related to the strategic direction of your company, the ability to forecast the effects of your actions—and thus causal inference—becomes of great significance.

Yet, we also learned that applying causal inference methods poses a number of significant challenges to practitioners. Not only is there an educational gap and many data scientists still do not have much experience with these tools, but cleanly identifying the root causes behind relationships in your data and ruling out alternative explanations can be time-consuming. Data science teams often simply do not have the time to run an elaborate study because of pressure to bring models to production quickly.

#### Need for a cultural change

Another important bottleneck we have encountered in our research is cultural though. Classical machine learning is all about minimizing prediction error. The more accurately your model is able to, e.g., classify x-ray images or forecast future stock market prices the better. This simple target gives you an objective standard of evaluation which is easy to understand for everyone. ML research made great progress in the past by running competitions on which methods and algorithms provide the best out-of-sample-fit in various problem domains ranging from image recognition to natural language processing. Such an objective and simple evaluation criterion is missing in causal inference.

CI is much harder than simply optimizing a loss function and context-specific domain knowledge plays a crucial role. Unless you can benchmark your model predictions to actual experiments, which is pretty rare in practice and even then, you will only be able to tell how well you did ex-post, there is no simple criterion to judge the accuracy of a particular estimate. The quality of causal inferences depends on several crucial assumptions, which are not easily testable with the data at hand. This forces people to completely rethink the way they approach their data science and ML problems.

In fact, there is an important theoretical reason why causal data science is challenging in that regard. It is called the Pearl causal hierarchy. The PCH, which is also known under the name ladder of causation, states that any data analysis can be mapped to one of three distinct layers of an information hierarchy. At the lowest rung there are associations, which refer to simple conditional probability statements between variables in the data. They remain purely correlational (“how does X relate to Y?”) and therefore do not have any causal meaning. The second rung relates to interventions (“what happens to Y if I manipulate X?”) and here we already enter the world of causality. On the third layer we finally have counterfactuals (“What would Y be if X had been x?”), which represent the highest form of causal reasoning.

#### Causal inference cannot be purely data-driven

The PCH tells us that to climb the ladder of causation and be able to infer causal effects from the data, we need to be willing to make at least some causal assumptions in the first place. “No causes in, no causes out”! This fact can be proven mathematically. There is no CI method that would be entirely data-driven. You always need that extra ingredient in form of specific domain knowledge that is introduced to the problem and which can only be judged based on experience and theoretical reasoning. This is the way causal diagrams work, for example, but other causal assumptions such as conditional independence, instrument validity or parallel trends in difference-in-differences fall into the same category.

Because these causal assumptions are necessarily context-specific, they are more complex and multidimensional than a simple fit criterion based on squared loss. That does not mean that they are in any way arbitrary though. The theoretical requirements for causal inference imposed by the PCH call for an entirely new way of thinking about data science, which also introduces non-trivial organizational challenges. We need to put domain experts such as clients, engineers, and sales partners in the loop, who can tell us whether our assumptions make sense and the way we model a certain problem is accurate. This will lead to a much more holistic approach to data science and the way teams are structured. Some first steps going in that direction are described here in a post by Patrick Doupe, principal economist at Zalando. In the coming months we plan to publish more content of that sort creating a dialogue between industry and academia on how to push causal inference applications in industry practice.

## A Global Decline in Research Productivity?

My coauthor Philipp Böing and I just released a new discussion paper:

“A Global Decline in Research Productivity? Evidence from China and Germany”

Abstract: In a recent paper, Bloom et al. (2020) find evidence for a substantial decline in research productivity in the U.S. economy during the last 40 years. In this paper, we replicate their findings for China and Germany, using detailed firm-level data spanning three decades. Our results indicate that diminishing returns in idea production are a global phenomenon, not just confined to the U.S.

## Innovation step by step

My coauthor Petra Andries (Ghent University) and I just published a new paper in Research Policy: “Firm-level effects of staged investments in innovation: The moderating role of resource availability” (preprint available on this website)

## Control variables in regressions — better don’t report them!

A while ago I wrote a short blog post with a pretty simple message: “Don’t Put Too Much Meaning Into Control Variables”. And I must say I was surprised by the many positive responses it got. The respective tweet received more than 1000 likes and nearly 400 retweets. And the blog post even got mentioned in an internal newsletter by the World Bank. So clearly there seems to be some demand for the topic. That’s why my coauthor Beyers Louw (PhD student at Maastricht University) and I decided to turn it into a citable research note, which is now available on arXiv:

“On the Nuisance of Control Variables in Regression Analysis”

Abstract: Control variables are included in regression analyses to estimate the causal effect of a treatment variable of interest on an outcome. In this note we argue that control variables are unlikely to have a causal interpretation themselves though. We therefore suggest to refrain from discussing their marginal effects in the results sections of empirical research papers.

Please use it and save yourself a paragraph or two in your next research paper! :)

## Public procurement as a policy instrument for innovation

We have a new paper Public Procurement of Innovation: Evidence from a German Legislative Reform out at IJIO (preprint available without paywall here under “Research”) and I’ve briefly summarized the content in a Twitter thread (apparently that’s were these things happen these days, blogs are so 2012…). For reference, I’ll link to the tweets below:

## Causal Inference in Business Practice – Survey

My colleagues and I are currently looking for data scientists to take part in a short survey (5–10 min) on causal inference in business practice. Is data-driven decision making important in your job? Then we’d love to hear your perspective: maastrichtuniversity.eu.qualtrics.com/jfe/form/SV_af

## Mapping Unchartered Territory

A frequent point of criticism against Directed Acyclic Graphs is that writing them down for a real-world problem can be a difficult task. There are numerous possible variables to consider and it’s not clear how we can determine all the causal relationships between them. We recently had a Twitter discussion where exactly this argument popped up again.

## PO vs. DAGs – Comments on Guido Imbens’ New Paper

Guido Imbens published a new working paper in which he develops a detailed comparison of the potential outcomes framework (PO) and directed acyclic graphs (DAG) for causal inference in econometrics. I really appreciate this paper, because it introduces a broader audience in economics to DAGs and highlights the complementarity of both approaches for applied econometric work. Continue reading PO vs. DAGs – Comments on Guido Imbens’ New Paper

## Causal Data Science in Business

A while back I was posting about Facebook’s causal inference group and how causal data science tools slowly find their way from academia into business. Since then I came across many more examples of well-known companies investing in their causal inference (CI) capabilities: Microsoft released its DoWhy library for Python, providing CI tools based on Directed Acylic Graphs (DAGs); I recently met people from IBM Research interested in the topic; Zalando is constantly looking for people to join their CI/ML team; and Lufthansa, Uber, and Lyft have research units working on causal AI applications too. Continue reading Causal Data Science in Business

## Don’t Put Too Much Meaning Into Control Variables

I’m currently reading this great paper by Carlos Cinelli and Chad Hazlett: “Making Sense of Sensitivity: Extending Omitted Variable Bias”. They develop a full suite of sensitivity analysis tools for the omitted variable problem in linear regression, which everyone interested in causal inference should have a look at. While kind of a side topic, they make an important point on page 6 (footnote 6):

[…] since the researcher’s goal is to estimate the causal effect of D on Y , usually Z is required only to, along with X, block the back-door paths from D to Y (Pearl 2009), or equivalently, make the treatment assignment conditionally ignorable. In this case, $\hat{\gamma}$ could reflect not only its causal effect on Y , if any, but also other spurious associations not eliminated by standard assumptions.