Conformal Prediction Intervals and Predictive Distributions保形预测区间与预测分布

时间:2024-10-15         阅读:

光华讲坛——社会名流与企业家论坛第6642

Conformal Prediction Intervals and Predictive Distributions保形预测区间与预测分布

主讲人NIAID Jing Qin教授

主持人统计学院林华珍教授

时间:10月16日16:00-17:00

举办地点柳林校区弘远楼408会议室

主办单位:统计研究中心和统计学院 科研处

主讲人简介:

Dr. Jing Qin is a Mathematical Statistician at the Biostatistics Research Branch of the National Institute of Allergy and Infectious Diseases (NIAID). He earned his Ph.D. in 1992 from the University of Waterloo and subsequently became an Assistant Professor at the University of Maryland, College Park. Before joining the National Institutes of Health (NIH) in 2004, Dr. Qin spent five years at the Memorial Sloan-Kettering Cancer Center. His research interests encompass a wide range of topics, including empirical likelihood methods, case-control studies, various-biased sampling problems, econometrics, survival analysis, missing data, causal inference, genetic mixture models, generalized linear models, survey sampling, and microarray data analysis. Recently, Dr. Qin’s work has focused on conformal inference for quantifying uncertainty in machine learning. In 2006, he was elected a Fellow of the American Statistical Association. He is also the author of a 2017 monograph titled

Biased Sampling, Over-identified Parametric Problems, and Beyond (Springer, ICSA Book Series in Statistics).

Qin Jing,美国国家过敏和传染病研究所(NIAID)生物统计研究部门的一名数理统计学家。他于1992年在滑铁卢大学获得博士学位,随后成为马里兰大学帕克分校的助理教授。在2004年加入美国国立卫生研究院(NIH)之前,秦博士在纪念斯隆-凯特琳癌症中心工作了五年。他的研究兴趣涵盖广泛的主题,包括经验似然方法、病例对照研究、各种有偏抽样问题、计量经济学、生存分析、缺失数据、因果推断、遗传混合模型、广义线性模型、抽样调查以及基因芯片数据分析。最近,秦博士的工作重点是用于量化机器学习中不确定性的保形推断。2006年,他被选为美国统计协会(ASA)DE Fellow。2017年出版专著《Biased Sampling, Over-identified Parametric Problems, and Beyond》(Springer出版社)。

内容简介

Conformal prediction (CP) is a machine learning framework for uncertainty quantification that produces statistically valid prediction regions (prediction intervals) for any underlying point predictor (whether statistical, machine, or deep learning) only assuming exchangeability of the data. Consider a scenario where we possess training data inclusive of both the feature variable X and the outcome Y . Simultaneously, we have test data that only includes the feature variable X. The objective is to construct a 95% confidence interval for the outcome Y in the test data. Lawless and Fredette (2005) addressed this challenge within parametric frameworks, employing a pivotal-based approach. Their method yields prediction intervals and predictive distributions with well-calibrated frequentist probability interpretations. However, as the dimension of the feature variable grows large, modeling the conditional distribution of Y jX becomes increasingly challenging. In this talk, we aim to extend their work by removing the parametric assumption for the predictive interval. Unfortunately, without making parametric assumptions about the conditional distribution of Y jX, obtaining an accurate estimation of conditional coverage becomes impossible. Instead, we will leverage the concept from the latest conformal inference (Vovk et al. 2005), which requires only accurate unconditional coverage. While the conformal predictive interval is inherently distribution-free, it is noteworthy that the choice of a robust working conditional model can significantly impact the resulting interval length. In essence, a well-designed conditional model contributes to the construction of shorter intervals, highlighting the practical importance of a thoughtful and effective modeling approach even in distribution-free settings. Furthermore, we will delve into the application of conformal predictive confidence intervals in more intricate scenarios. This includes situations where there is a covariate shift between training and test data, as well as cases where the outcome Y might be right-censored.

保形预测(Conformal prediction, CP)是一种用于不确定性量化的机器学习框架,它可以为任何底层点预测器(无论是统计学习、机器学习还是深度学习)生成具有统计有效性的预测区间(预测间隔),仅假设数据的可交换性。设想一种情景,拥有包括特征变量X和结果Y的训练数据,同时还有仅包含特征变量X的测试数据。目标是为测试数据中的结果Y构建一个95%的置信区间。Lawless和Fredette(2005)在参数框架下解决了这一问题,采用基于枢轴的方式。该方法生成的预测区间和预测分布具有良好的频率学概率解释。然而,随着特征变量维度的增加,对条件分布P(Y|X)进行建模变得愈发困难。

在本次讨论中,主讲人旨在通过移除预测区间的参数假设,来扩展他们的工作。然而,如果不对P(Y|X)的条件分布作出参数假设,就无法准确估计条件覆盖率。取而代之的是,主讲人将借鉴最新的保形推断(Vovk等人,2005)的概念,该方法只需要精确的无条件覆盖率。尽管保形预测区间本质上是分布无关的,但值得注意的是,选择一个稳健的条件模型能够显著影响预测区间的长度。简单来说,设计良好的条件模型有助于构建更短的预测区间,突出了即使在分布无关的设置中,精心且有效的建模方法仍然具有实际重要性。

此外,主讲人还将探讨在更复杂情境下保形预测置信区间的应用,包括训练数据与测试数据之间存在协变量漂移的情况,以及结果Y可能被右删失的情形。

最新信息

鸿博体育(中国)有限公司  版权所有 webmaster@swufe.edu.cn     蜀ICP备 05006386-1号      川公网安备51010502010087号