A frequent point of criticism against Directed Acyclic Graphs is that writing them down for a real-world problem can be a difficult task. There are numerous possible variables to consider and it’s not clear how we can determine all the causal relationships between them. We recently had a Twitter discussion where exactly this argument popped up again.
I’ve written about this problem before, where I argue that DAGs actually don’t have to be that complex, if we look at, for example, the models we work with in structural econoemtrics or economic theory. But Jason Abaluck, professor at the Yale School of Management, brought up an interesting example that might be useful for illustrating what I have in mind.
Here is my reply:
It’s good point that mapping out what we know in a DAG – especially for unchartered territory – can be complex. Related to the specific example of the college wage premium, I would advise a grad student who studies this question to first do a thorough literature review. That’s the basis for synthesizing what we’ve learned in 50 years or so about the topic. The DAG then serves as a perfect tool for organizing this body of knowledge. Now, for some arrows the decision to include or omit them might be ambiguous. But these are exactly the cases where there is a need for future research. A great opportunity for a fresh grad student.
This process is of course quite tedious, but there isn’t really an alternative to it. When we justify the exogeneity of our instruments, we also need to know all possible confounders that might play a role. The same goes for arguing that there is no self-selection around the discontinuity threshold or that common trends hold. We can only justify these assumptions by synthesizing the prior knowledge we have about the subject under study.
The fact that some people think this would be different with potential outcome methods, is because we’ve accepted loose standards for arguing verbally about ignorability, exogeneity and causal mechanism in our papers and seminars. This process is highly non-transparent and prone to arguments by authority.
Going through the entire body of knowledge about a specific problem and casting it into a DAG is cumbersome, I realize. Once we will start to make our assumptions more explicit though, others will be able to build on our work. They can then test the proposed model against the available data or look for experimental evidence for ambigious causal relationships. This process of knowledge curation is not something one paper can achieve alone, it has to be a truly collaborative exercise. I don’t see how we can have real progress in a field without it.