Mapping Unchartered Territory

A frequent point of criticism against Directed Acyclic Graphs is that writing them down for a real-world problem can be a difficult task. There are numerous possible variables to consider and it’s not clear how we can determine all the causal relationships between them. We recently had a Twitter discussion where exactly this argument popped up again.

I’ve written about this problem before, where I argue that DAGs actually don’t have to be that complex, if we look at, for example, the models we work with in structural econoemtrics or economic theory. But Jason Abaluck, professor at the Yale School of Management, brought up an interesting example that might be useful for illustrating what I have in mind.

Here is my reply:

It’s good point that mapping out what we know in a DAG – especially for unchartered territory – can be complex. Related to the specific example of the college wage premium, I would advise a grad student who studies this question to first do a thorough literature review. That’s the basis for synthesizing what we’ve learned in 50 years or so about the topic. The DAG then serves as a perfect tool for organizing this body of knowledge. Now, for some arrows the decision to include or omit them might be ambiguous. But these are exactly the cases where there is a need for future research. A great opportunity for a fresh grad student.

This process is of course quite tedious, but there isn’t really an alternative to it. When we justify the exogeneity of our instruments, we also need to know all possible confounders that might play a role. The same goes for arguing that there is no self-selection around the discontinuity threshold or that common trends hold. We can only justify these assumptions by synthesizing the prior knowledge we have about the subject under study.

The fact that some people think this would be different with potential outcome methods, is because we’ve accepted loose standards for arguing verbally about ignorability, exogeneity and causal mechanism in our papers and seminars. This process is highly non-transparent and prone to arguments by authority.

Going through the entire body of knowledge about a specific problem and casting it into a DAG is cumbersome, I realize. Once we will start to make our assumptions more explicit though, others will be able to build on our work. They can then test the proposed model against the available data or look for experimental evidence for ambigious causal relationships. This process of knowledge curation is not something one paper can achieve alone, it has to be a truly collaborative exercise. I don’t see how we can have real progress in a field without it.

Graphs and Occam’s Razor

One argument / point of criticism I often hear from people who start exploring Directed Acyclic Graphs (DAG) is that graphical models can quickly become very complex. When you read about the methodology for the first time you get walked through all these toy models – small, well-behaved examples with nice properties, in which causal inference works like a charm.

Continue reading Graphs and Occam’s Razor

Why Tobit models are overused

In my field of research we’re often running regressions with innovation expenditures or sales with new products aon the left-hand side. Usually we observe many zeros for these variables because firms do not invest at all in R&D and therefore also do not come up with new products. Many researchers then feel inclined to use Tobit models. But frankly, I never understood why. Continue reading Why Tobit models are overused