首届机器学习与统计会议将于2023年8月24日-26日在华东师范大学普陀校区召开,本次会议由中国现场统计研究会机器学习分会主办,华东师范大学统计学院、统计交叉科学研究院、统计与数据科学前沿理论及应用教育部重点实验室及统计应用与理论研究创新引智基地联合承办。会议旨在促进机器学习与统计领域的国内外学者进行学术交流,引领机器学习与统计共同交叉发展的学术文化,推动作为数据科学与人工智能的奠基性学科的进步,以此助力相关数字经济产业的发展。
主题报告专场(三)
Machine Learning for Interpretability
报告时间:
2023年8月25日 13:30-15:00
报告地址:
华东师范大学普陀校区 文史楼211
组 织 者:
於州  华东师范大学
01
 涂云东  北京大学
题目:Structurally Grouped Approximate Factor Models
摘要:This paper explores the group structure in large dimensional approximate factor models, which portrays homogeneous effects of the common factors on the individuals that fall into the same group. With the initial principal component estimates, we identify the unknown group structure by a combination of the agglomerative hierarchical clustering algorithm and an information criterion. The loadings and factors are then re-estimated conditional on the identified groups. Under some regularity conditions, we establish the consistency of the membership estimator as well as that of the group number estimator obtained from the information criterion. The new estimators under the group structure are shown to achieve efficiency gain compared to those obtained without this information. Numerical simulations and empirical applications demonstrate the nice finite sample performance of our proposed approach when group structure presents.
简介:涂云东,北京大学光华管理学院和北京大学统计科学中心联席教授。入选“日出东方”北大光华青年人才,北京大学优秀博士学位论文指导教师,教育部“长江学者奖励计划”青年长江学者。2004年和2006年先后获武汉大学理学学士学位和经济学硕士学位,2012年获美国加州大学河滨分校经济学博士学位。亚太青年计量经济学者会议发起人和组织者。30余篇学术论文发表在多个国际国内知名专业杂志。主持多个国家自然科学基金项目,并担任自然科学基金匿名评审。曾获世界计量经济学会、加州计量经济学会议等学术组织提供的青年学者研究资助。研究领域涵盖时间序列分析、非参数计量方法、大数据分析、金融计量和预测等。
02
贺莘  上海财经大学
题目:Efficient learning of nonparametric directed acyclic graph with statistical guarantee
摘要Directed acyclic graph (DAG) models are widely used to represent casual relations among collected nodes. This paper proposes an efficient and consistent method to learn DAG with a general causal dependence structure, which is in sharp contrast to most existing methods assuming linear dependence of causal relations. To facilitate DAG learning, the proposed method leverages the concept of topological layer, and connects nonparametric DAG learning with kernel ridge regression in a smooth reproducing kernel Hilbert space (RKHS) and learning gradients by showing that the topological layers of a nonparametric DAG can be exactly reconstructed via kernel-based estimation, and the parent-child relations can be obtained directly by computing the estimated gradient function. The developed algorithm is computationally efficient in the sense that it attempts to solve a convex optimization problem with an analytic solution, and the gradient functions can be directly computed by using the derivative reproducing property in the smooth RKHS. The asymptotic properties of the proposed method are established in terms of exact DAG recovery without requiring any explicit model specification. Its superior performance is also supported by a variety of simulated and a real-life example.
简介:贺莘,上海财经大学统计与管理学院, 副教授,博士生导师。主要研究领域为统计机器学习及其在经济金融、医学健康中的应用,研究成果发表在Journal of Machine Learning Research、Journal of the American Statistical Association、 Journal of Computational and Graphical Statistics、Electronic Journal of Statistics、Statistica Sinica、Thyroid等国际期刊上。主持自然科学青年基金一项以及上海市浦江人才计划一项。
03
谌自奇  华东师范大学
题目:K-Nearest-Neighbor Local Sampling Based Conditional Independence Testing
摘要:Conditional independence (CI) testing is a fundamental task in statistics and machine learning, but its effectiveness is hindered by the challenges posed by high-dimensional conditioning variables and limited data samples. This article introduces a novel testing approach to address these challenges and enhance control of the type I error while achieving high power under alternative hypotheses. The proposed approach incorporates a computationally efficient classifier-based conditional mutual information (CMI) estimator, capable of capturing intricate dependence structures among variables. To approximate a distribution encoding the null hypothesis, a k-nearest-neighbor local sampling strategy is employed. An important advantage of this approach is its ability to operate without assumptions about distribution forms or feature dependencies. Furthermore, it eliminates the need to derive asymptotic null distributions for the estimated CMI and avoids dataset splitting, making it particularly suitable for small datasets. The method presented in this article demonstrates asymptotic control of the type I error and consistency against all alternative hypotheses. Extensive analyses using both synthetic and real data highlight the computational efficiency of the proposed test. Moreover, it outperforms existing state-of-the-art methods in terms of type I and II errors, even in scenarios with high-dimensional conditioning sets. Additionally, the proposed approach exhibits robustness in the presence of heavy-tailed data.
简介:谌自奇,华东师范大学研究员、紫江青年学者、博士生导师。从事高维统计分析、函数型(纵向)数据分析、生存分析、机器学习、神经网络、因果推断等方面的研究。主持国家自然科学基金面上项目2项,国家自然科学基金青年项目1项,上海市自然科学基金项目1项,湖南省自然科学基金项目1项,获得中国博士后面上和特别资助等。曾于2016-2019在美国安德森癌症研究中心生物统计系从事博士后研究工作。在JASA, Biometrics, Statistica Sinica, Scandinavian Journal of Statistics等国际权威统计期刊和AAAI,IJCNN等国际著名机器学习和人工智能会议上发表论文20余篇。
04
钟齐先  厦门大学
题目:Neural Networks for Partially Linear Quantile Regression
摘要:Deep learning has enjoyed tremendous success in a variety of applications but its application to quantile regression remains scarce. A major advantage of the deep learning approach is its flexibility to model complex data in a more parsimonious way than nonparametric smoothing methods. However, while deep learning brought breakthroughs in prediction, it is not well suited for statistical inference due to its black box nature. In this paper, we leverage the advantages of deep learning and apply it to quantile regression where the goal is to produce interpretable results and perform statistical inference. We achieve this by adopting a semiparametric approach based on the partially linear quantile regression model, where covariates of primary interest for statistical inference are modelled linearly and all other covariates are modelled nonparametrically by means of a deep neural network. In addition to the new methodology, we provide theoretical justification for the proposed model by establishing the root-n consistency and asymptotically normality of the parametric coefficient estimator and the minimax optimal convergence rate of the neural nonparametric function estimator. Across numerical studies, the proposed model empirically produces superior estimates and more accurate predictions than various alternative approaches.
简介:钟齐先,厦门大学统计学与数据科学系助理教授,清华大学数学科学系理学博士(2021)。研究领域为深度学习、生存数据和函数型数据分析。相关学术成果发表在AOSBiometrikaJBESNeurIPS等学术期刊或会议上,主持国家自然科学基金青年项目。担任AOSJRSSBJASABiometrikaBernoulliBiostatistics等学术期刊的匿名审稿人。
本次会议无需注册费,请扫描下方二维码完成会议注册流程。
 ●  获取更多会议信息,请登录会议官网:
 https://ml-stat.github.io/MLSTAT2023/
继续阅读
阅读原文