IV regressions without instruments (technical)

Arthur Lewbel published a very interesting paper back in 2012 in the Journal of Business & Economic Statistics (ungated version here). The paper attracted quite some attention because it lays out a method to do two-stage least squares regressions (in order to identify causal effects) without the need for an outisde instrumental variable. Consider a triangular model

(1) \quad Y_1 = \beta_{01} + \beta_{11}X_1 + \beta_{21}Y_2 + \epsilon_1

(2) \quad Y_2 = \beta_{02} + \beta_{12}X_1 + \epsilon_2


\epsilon_1 = \alpha_1 U + V_1

\epsilon_2 = \alpha_2 U + V_2

The common factor U (think about the textbook example of unobserved ability in a wage regressions) creates a correlation between the errors that leads to an endogeneity problem when estimating (1). You can see that there is no exclusion restriction available in equation (2), because X_1 appears in both lines. Nevertheless, it is possible to estimate the parameters in (1) consistently when the following two assumptions are fulfilled

(A1) \quad Cov(Z, \epsilon_2^2) \neq 0

(A2) \quad Cov(Z, \epsilon_1 \cdot \epsilon_2) = 0

Z is an observed random vector, which can be (but doesn’t have to be) a subset of the regressor vector X. (A2) places restrictions on the covariance matrix of the model errors which are satisfied in the above case of a common unobserved factor. In addition, the method requires heteroskedasticity in \epsilon_2 (in both \epsilon_1 and \epsilon_2 for non-triangular models), which arises frequently in applied work.

Lewbel’s method works like a charm in simulation studies. However, it was developed for linear models (footnote 1). But what happens if you have a binary endogenous variable? Let’s consider \ Y_2 being Probit

(3) \quad Y_1 = \beta_{01} + \beta_{11}X_1 + \beta_{21}Y_2 + \epsilon_1

(4) \quad Y_2 = 1[\beta_{02} + \beta_{12}X_1 + \nu_2 > 0]

with 1[…] being the indicator function and \epsilon_1 as before. \nu_2 = \alpha_2 U + V_2 has to be standard normal such that, for independent U and V, it has to hold that Var(U) + Var(V_2) = 1. Note that

Pr(Y_2=1|X) = E(Y_2|X) = \Phi(\beta_{02} + \beta_{12}X_1) = \Phi(X'\beta)

and we can rewrite equation (4) with additive error

\Rightarrow (4) \quad Y_2 = \Phi(X'\beta) + \epsilon_2

with \epsilon_2 = Y_2 - \Phi(X'\beta), which is a function of X! Intuitively, the additive \epsilon_2 cannot vary freely for binary \ Y_2. It has to be smaller when X is either small or large, otherwise we would not stay in the supposed bounds of zero and one. This means that there is heteroskedasticity in (4) by construction since

Var(Y_2|X) = \Phi(X'\beta) (1 - \Phi(X'\beta))

is clearly not constant.

Initially, I thought this is great because it should mean that Lewbel’s method is always applicable with a binary endogenous regressor. What’s with the second assumption though? Inserting \epsilon_2 in (A2) gives

Cov(Z, \epsilon_1 \cdot \epsilon_2) = Cov(Z, \epsilon_1 \cdot (Y_2 - \Phi(X'\beta))

When Z (footnote 2) is a subset of X this covariance is not zero. (A2) is violated!

To get a feeling for the problem I created simulated data with 2,000 observations and the following parametrization (notation is a bit sloppy due to the limited LaTex capabilities of WordPress)

U = N(0,\sqrt{0.5}), \epsilon_1 = N(0,\sqrt{0.5}) + U, \nu_2 = N(0,\sqrt{0.5}) + U, and Z = X_1

(5) \quad Y_1 = 1 + X_1 + Y_2 + \epsilon_1

(6) \quad Y_2 = 1[1 + X_1 + \nu_2 > 0]

Using the user-written Stata command ivreg2h gave the following outputlewbel1

Estimates are far off the true coefficients (which are all equal to one). And this wasn’t just an unlucky draw. The average estimate of \beta_{21} in a small Monte-Carlo study with 200 repetitions was equal to 1.83.

You might object that in order to construct the instruments Lewbel suggests, E(Z-E(Z)) \epsilon_2, you have to estimate the exact \epsilon_2. By contrast, ivreg2h assumes a linear equation for Y_2. But things don’t improve much if you estimate equation (6) by Probit and construct the instruments manually.


To conclude: Be careful with applying the method in a situation with binary endogenous regressor. There is at least one case (Y_2 being Probit) where the estimator is inconsistent. It might still work for other structural specifications. And it would be great if somebody worked out the conditions under which it does. Until then, however, I would refrain from using Lewbel’s method in the binary case. It’s not robust to miss-specifications of the Y_2-equation and we don’t know yet when it works and when it doesn’t.


(1) He also presents an extension to partly linear systems which, however, does not capture the limited dependent data case.

(2) If, on the other hand, Z is restricted to be an outside variable, not contained in X, then I don’t see how you can satisfy the requirement of heteroskedasticity (A1). Maybe with some sort of heteroskedastic Probit specification. But I haven’t worked that out. Especially introducing the common factor—which leads to endogeneity in the triangular model—seems to be non-trivial.

Update: Fixed an error and added some clarifying remarks. Thanks to Arthur Lewbel for the pointer!

One thought on “IV regressions without instruments (technical)”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s