首届机器学习与统计会议将于2023年8月24日-26日在华东师范大学普陀校区召开,本次会议由中国现场统计研究会机器学习分会主办,华东师范大学统计学院、统计交叉科学研究院、统计与数据科学前沿理论及应用教育部重点实验室及统计应用与理论研究创新引智基地联合承办。会议旨在促进机器学习与统计领域的国内外学者进行学术交流,引领机器学习与统计共同交叉发展的学术文化,推动作为数据科学与人工智能的奠基性学科的进步,以此助力相关数字经济产业的发展。
主题报告专场(十二)
Recent Developments on High-Dimensional Statistical Learning
报告时间:
2023年8月26日 10:30-12:00
报告地址:
华东师范大学普陀校区 文史楼215
组 织 者:
何勇 山东大学
01
 何勇  山东大学
题目:The Role of Fine-Tuning: Transfer Learning for High-Dimensional M-Estimators with Decomposable Regularizers
摘要Transfer learning algorithms have been developed in various applicational contexts while only a few of them offer statistical guarantees in high-dimensions. Among these work, the differences between the target and sources, a.k.a. the contrasts, are typically modeled as, or at least close to, vectors with certain low-dimensional structure (e.g., sparsity), resulting in a separate debiasing step after a preceding pooling estimation procedure. Under such intuitive yet powerful framework, additional homogeneity conditions on Hessian matrices of the population loss functions are often imposed to preserve the delicate low-dimensional structure of the contrasts during pooling, which is either unrealistic in practice or easily destroyed by basic data transformation such as standardization. In this article, under the general M-estimators framework with decomposable regularizers, we highlight the role of fine-tuning underneath the conspicuous gain of the debiasing step in transfer learning. Namely, we find it is possible to enhance estimation accuracy by fine-tuning a primal estimator sufficiently close to the true target one. Our theory suggests slightly enlarging the pooling regularization strength when either the contrast's low-dimensional structure or the homogeneity of Hessian matrices is violated. Traditional linear regression and generalized low-rank trace regression in high-dimensions are discussed as two specific examples under our framework. When the informative source datasets are unknown, a novel truncated-penalized algorithm is proposed to directly output the primal estimator by simultaneously selecting the useful sources and its oracle property is proved. Extensive numerical experiments are conducted to validate the theoretical assertions. A case study on the air quality regulation in China by transfer learning is also provided for illustration.
简介:何勇,山东大学金融研究院,教授,山东大学齐鲁青年学者 (2022);山东大学学士(2012),复旦大学博士(2017),师从张新生教授;从事金融计量统计、高维统计以及机器学习等方面的研究,在国际计量及统计学权威期刊Journal of Econometrics, Journal of Business and Economic Statistics, Biometrics (封面文章), Biostatistics、中国科学:数学等发表研究论文30余篇;主持国家自然科学基金面上项目、青年基金,全国统计科学研究重点项目等,获第一届统计科学技术进步奖(第二位)。担任美国数学评论评论员,及AOS, JRSSB, JOE, JBES, JRSSC, Biometrics, EJS,JMVA等国际知名学术期刊匿名审稿人。
02
虞龙  上海财经大学
题目:Matrix Quantile Factor Model
摘要This paper introduces a matrix quantile factor model for matrix-valued data with a low-rank structure. We estimate the row and column factor spaces via minimizing the empirical check loss function over all panels. We show the estimates converge at rate 1/min{(p_1p_2)^0.5, (p_1T)^0.5, (p_2T)^0.5}  in average Frobenius norm, where p_1, p_2, T are the row dimensionality, column dimensionality and length of the matrix sequence. This rate is faster than that of the quantile estimates via ``flattening" the matrix model into a large vector model. Smoothed estimates are given and their central limit theorems are derived under some mild condition. We provide three consistent criteria to determine the pair of row and column factor numbers. Extensive simulation studies and an empirical study justify our theory.
简介:虞龙,上海财经大学统计与管理学院助理教授。研究领域为多元统计分析、高维数理统计、计量经济学、随机矩阵理论等,重点关注基于因子模型的数据建模、数据降维、稳健分析等方向,在统计学、计量经济学、生物统计等领域国际顶级或者权威期刊Biometrika, Journal of Econometrics, Journal of Business and Economic Statistics, Bernoulli, Journal of Multivariate Analysis, Biostatistics 等发表学术论文多篇。2018年至2019年访问美国密歇根大学安娜堡分校统计系完成博士联合培养项目,2020年-2022年访问新加坡国立大学统计与数据科学系完成博士后研究员工作。
03
李哲  复旦大学
题目:Consistent Selection of the Number of Groups in Panel Models via Sample-Splitting
摘要:Group number selection is a key question for group panel data modelling. In this work, we develop a cross validation method to tackle this problem. Specifically, we split the panel data into a training dataset and a testing dataset on the time span. We first use the training dataset to estimate the parameters and group memberships. Then we apply the fitted model to the testing dataset and then the group number is estimated by minimizing certain loss function values on the testing dataset. We design the loss functions for panel data models either with or without fixed effects. The proposed method has two advantages. First, the method is totally data-driven thus no further tuning parameters are involved. Second, the method can be flexibly applied to a wide range of panel data models. Theoretically, we establish the estimation consistency by taking advantage of the optimization property of the estimation algorithm. Experiments on a variety of synthetic and empirical datasets are carried out to further illustrate the advantages of the proposed method.
简介:李哲,复旦大学大数据学院在读博士生,导师为朱雪宁副教授。本硕毕业于复旦大学大数据学院。目前研究领域为网络数据分析、空间计量模型、分布式计算等,已有研究工作发表在 Computational Statistics & Data Analysis 期刊上。
04
周扬  北京师范大学
题目:Covariance Test for Discretely Observed Functional Data
摘要:In this paper, covariance test is studied for discretely observed functional data with noise. A projection-based test statistic is constructed with growing number of estimated eigenfunctions, which has the asymptotic Chi-squared distribution facilitated by advancing the perturbation bounds of estimated eigenfunctions. The theoretic analysis reveals a connection among the permissible truncation level, the sampling frequency and the sample size. Some numerical studies are investigated.
简介:Yang Zhou achieved his PhD degree in Beihang University, and finished his postdoctoral training at School of Mathematical Science in Peking University. He is now a lecturer in the Department of Mathematical Statistics, School of Statistics, Beijing Normal University. His research interests include high-dimensional and functional data analysis, statistical learning theory.
本次会议无需注册费,请扫描下方二维码完成会议注册流程。
 获取更多会议信息,请登录会议官网:
 https://ml-stat.github.io/MLSTAT2023/
往期回顾
REVIEW

会议通知 | 首届机器学习与统计会议暨中国现场统计研究会机器学习分会成立大会

继续阅读
阅读原文