skip to main content

Ulric B. and Evelyn L. Bray Social Sciences Seminar

Tuesday, May 13, 2025
4:00pm to 5:00pm
Add to Cal
Baxter B125
Inference on High Dimensional Selective Labeling Models
Shakeeb Khan, Professor of Economics, Morrissey College of Arts & Sciences, Boston College,

Abstract: The paper reconsiders the problem of inference on parameters in binary outcome models when these outcomes are subject to possibly endogenous censoring. Recently, these models have gained increasing interest in the computer science and machine learning literatures where the issue of endogenous sample selection is referred to as the selective labels problem. Such models are relevant in diverse empirical settings, including criminal justice, healthcare, and insurance. Notable recent studies in this area include Lakkaraju et al. (2017), Kleinberg et al. (2018), and Coston, Rambachan, and Chouldechova (2021), which examine judicial bail decisions—where the outcome of whether a defendant fails to appear in court is observed only if the judge grants bail. Inference on such model parameters can be computationally challenging for two reasons.

One is the nonconcavity of the bivariate likelihood function, and the other is the large number of covariates in each equation. Despite these hurdles, we propose a novel distribution free estimation procedure that is computationally friendly especially in the many covariates settings. The new method combines the semiparametric batched gradient descent algorithm introduced in Khan, Lan, Tamer, and Yao (2024) with a novel sorting algorithm incorporated to control for selection bias. Asymptotic properties of the new procedure are established under increasing dimension conditions in both equations, and its finite sample properties are explored through a simulation study and an application using judicial bail data.

Written with Elie Tamer and Qingsong Yao.

For more information, please contact Sabrina Hameister by phone at 626-395-4228 or by email at sboschet@caltech.edu.