Arthur Lewbel published a very interesting paper back in 2012 in the Journal of Business & Economic Statistics (ungated version here). The paper attracted quite some attention because it lays out a method to do two-stage least squares regressions (in order to identify causal effects) without the need for an outisde instrumental variable. Consider a triangular model

and

The common factor (think about the textbook example of unobserved ability in a wage regressions) creates a correlation between the errors that leads to an endogeneity problem when estimating (1). You can see that there is no exclusion restriction available in equation (2), because appears in both lines. Nevertheless, it is possible to estimate the parameters in (1) consistently when the following two assumptions are fulfilled

is an observed random vector, which can be (but doesn’t have to be) a subset of the regressor vector . (A2) places restrictions on the covariance matrix of the model errors which are satisfied in the above case of a common unobserved factor. In addition, the method requires **heteroskedasticity** in (in both and for non-triangular models), which arises frequently in applied work.

Lewbel’s method works like a charm in simulation studies. However, it was developed for linear models (footnote 1). But what happens if you have a binary endogenous variable? Let’s consider being Probit

1[]

with 1[…] being the indicator function and as before. has to be standard normal such that, for independent and , it has to hold that . Note that

and we can rewrite equation (4) with additive error

with , which is a function of X! Intuitively, the additive cannot vary freely for binary . It has to be smaller when is either small or large, otherwise we would not stay in the supposed bounds of zero and one. This means that there is heteroskedasticity in (4) by construction since

is clearly not constant.

Initially, I thought this is great because it should mean that Lewbel’s method is always applicable with a binary endogenous regressor. What’s with the second assumption though? Inserting in (A2) gives

When (footnote 2) is a subset of this covariance is not zero. (A2) is violated!

To get a feeling for the problem I created simulated data with 2,000 observations and the following parametrization (notation is a bit sloppy due to the limited LaTex capabilities of WordPress)

, , , and

1[]

Using the user-written Stata command ivreg2h gave the following output

Estimates are far off the true coefficients (which are all equal to one). And this wasn’t just an unlucky draw. The average estimate of in a small Monte-Carlo study with 200 repetitions was equal to 1.83.

You might object that in order to construct the instruments Lewbel suggests, , you have to estimate the exact . By contrast, ivreg2h assumes a linear equation for . But things don’t improve much if you estimate equation (6) by Probit and construct the instruments manually.

**To conclude:** Be careful with applying the method in a situation with binary endogenous regressor. There is at least one case ( being Probit) where the estimator is inconsistent. It might still work for other structural specifications. And it would be great if somebody worked out the conditions under which it does. Until then, however, I would refrain from using Lewbel’s method in the binary case. It’s not robust to miss-specifications of the -equation and we don’t know yet when it works and when it doesn’t.

**Footnotes:**

(1) He also presents an extension to partly linear systems which, however, does not capture the limited dependent data case.

(2) If, on the other hand, is restricted to be an outside variable, not contained in , then I don’t see how you can satisfy the requirement of heteroskedasticity (A1). Maybe with some sort of heteroskedastic Probit specification. But I haven’t worked that out. Especially introducing the common factor—which leads to endogeneity in the triangular model—seems to be non-trivial.

**Update: **Fixed an error and added some clarifying remarks. Thanks to Arthur Lewbel for the pointer!

## One thought on “IV regressions without instruments (technical)”