Ulric B. and Evelyn L. Bray Social Sciences Seminar
Abstract: The paper reconsiders the problem of inference on parameters in binary outcome models when these outcomes are subject to possibly endogenous censoring. Recently, these models have gained increasing interest in the computer science and machine learning literatures where the issue of endogenous sample selection is referred to as the selective labels problem. Such models are relevant in diverse empirical settings, including criminal justice, healthcare, and insurance. Notable recent studies in this area include Lakkaraju et al. (2017), Kleinberg et al. (2018), and Coston, Rambachan, and Chouldechova (2021), which examine judicial bail decisions—where the outcome of whether a defendant fails to appear in court is observed only if the judge grants bail. Inference on such model parameters can be computationally challenging for two reasons.
One is the nonconcavity of the bivariate likelihood function, and the other is the large number of covariates in each equation. Despite these hurdles, we propose a novel distribution free estimation procedure that is computationally friendly especially in the many covariates settings. The new method combines the semiparametric batched gradient descent algorithm introduced in Khan, Lan, Tamer, and Yao (2024) with a novel sorting algorithm incorporated to control for selection bias. Asymptotic properties of the new procedure are established under increasing dimension conditions in both equations, and its finite sample properties are explored through a simulation study and an application using judicial bail data.
Written with Elie Tamer and Qingsong Yao.