会场介绍 | 第15届中国 R 会（北京）——人大专场：统计计算与深度学习

2022年，第15届中国 R 会（北京）将于11月19-25日在中国人民大学召开，本次会议由统计之都，中国人民大学统计学院、中国人民大学应用统计科学研究中心主办，得到 Posit 赞助支持，将以线上会议和线下会议相结合的方式举办。欢迎进入 R 会官网，获取更多会议信息！

链接：

https://china-r.org/bj2022/index.html

下面为您奉上本次 R 会人大专场：统计计算与深度学习演讲介绍，本会场主席为吕晓玲：

人大专场：统计计算与深度学习

时间：2022年11月20日下午14:00-16:15

腾讯会议号：403996694

腾讯会议链接：https://meeting.tencent.com/dm/sahiI1GOKmHL

线下会场：明德主楼1037

李伟

Nonparametric inference about mean functionals of nonignorable nonresponse data without identifying the joint distribution

个人简介

李伟，中国人民大学统计学院副教授。主要研究领域为因果推断、缺失数据、高维统计等。目前已在包括Biometrika, Journal of Econometrics, Biometrics等期刊上发表多篇学术论文，主持国家自然科学青年基金项目和全国统计科学研究重点项目各一项。

报告摘要

We consider identification and inference about mean functionals of observed covariates and an outcome variable subject to nonignorable missingness. By leveraging a shadow variable, we establish a necessary and sufficient condition for identification of the mean functional even if the full data distribution is not identified. We further characterize a necessary condition for root n-estimability of the mean functional. This condition naturally strengthens the identifying condition, and it requires the existence of a function as a solution to a representer equation that connects the shadow variable to the mean functional. Solutions to the representer equation may not be unique, which presents substantial challenges for nonparametric estimation and standard theories for nonparametric sieve estimators are not applicable here. We construct a consistent estimator for the solution set and then adapt the theory of extremum estimators to find from the estimated set a consistent estimator for an appropriately chosen solution. The estimator is asymptotically normal, locally efficient and attains the semiparametric efficiency bound under certain regularity conditions. We illustrate the proposed approach via simulations and a real data application on home pricing.

王菲菲

Factor-Assisted Federated Learning for Personalized Optimization with Heterogeneous Data

个人简介

王菲菲，中国人民大学统计学院副教授。研究上关注文本挖掘及其商业应用、社交网络分析、大数据建模等，研究论文发表于Journal of Econometrics, Journal of Business and Econometric Statistics, Journal of Machine Learning Research, 中国科学（数学）等国内外高水平期刊上。主持并参与了国家自科基金项目、教育部社科重大项目、国家重点研发项目等多个课题。

报告摘要

Federated learning is an emerging distributed machine learning approach, which can simultaneously train a global model from decentralized datasets while preserve data privacy. However, data heterogeneity is one of the core challenges in federated learning. The heterogeneity issue may severely degrade the convergence rate and prediction performance of the model trained in federated learning. To address this issue, we develop a novel personalized federated learning method for heterogeneous data, which is called FedFac. The proposed method is motivated by a common finding that, data in different clients contain both common knowledge and personalized knowledge. Therefore, the two types of knowledge should be decomposed and taken advantages of separately. We introduce the idea of factor analysis to distinguish the client-shared information and client-specific information. With this decomposition, a new objective function is established and optimized. Both theoretical and empirical analysis demonstrate that FedFac has higher computational efficiency against the classical federated learning approaches. The superior prediction performance of FedFac is also verified empirically by comparison with various state-of-the-art federated learning methods on several real datasets.

胡威

Grouped spatial autoregressive model

个人简介

胡威，中国人民大学在读博士生，研究兴趣为网络数据分析、网络数据采样方法、空间自回归模型、超高维数据分析等，研究论文发表于Computational Statistics & Data Analysis、Electronic Journal of Statistics等期刊上。

报告摘要

With the development of the internet, network data with replications can be collected at different time points. The spatial autoregressive panel (SARP) model is a useful tool for analyzing such network data. However, in the traditional SARP model, all individuals are assumed to be homogeneous in their network autocorrelation coefficients, while in practice, correlations could differ for the nodes in different groups. Here, a grouped spatial autoregressive (GSAR) model based on the SARP model is proposed to permit network autocorrelation heterogeneity among individuals, while analyzing network data with independent replications across different time points and strong spatial effects. Each individual in the network belongs to a latent specific group, which is characterized by a set of parameters. Two estimation methods are studied: two-step naive least-squares estimator, and two-step conditional least-squares estimator. Furthermore, their corresponding asymptotic properties and technical conditions are investigated. To demonstrate the performance of the proposed GSAR model and its corresponding estimation methods, numerical analysis was performed on simulated and real data.

李梦媛

Trajectory Representation Learning with Multilevel Attention for Driver Identification

个人简介

李梦媛，中国人民大学统计学院在读博士生，主要研究方向轨迹数据挖据等。

报告摘要

Massive trajectory data have originated from the development of positioning technology. Learning GPS trajectory representation to characterize a driver’s driving style is a challenging task with important applications in many areas, including autonomous driving, auto insurance, advanced driver assistance systems, urban computing, and the internet of things. Few studies have considered the interactions between different factors. In this study, we propose a novel trajectory representation method based on a multilevel attention mechanism (ATTraj2vec) and apply it to the task of driver identification. In addition to summarizing motion features from GPS trajectory data, we also extract spatial and temporal features. We use a multilevel attention mechanism to aggregate the interactions of motion features with temporal and spatial features progressively. Additionally, we adopt multi-loss to optimize our model simultaneously, which consists of a softmax loss for driver classification and Siamese loss for making trajectories from the same driver more similar. Classification experimental results on a real-world automobile trajectory dataset demonstrated that our proposed model significantly outperforms existing baselines. Meanwhile, the proposed method provides significant gains in the trajectory

clustering of unseen drivers.

参与方式

本会场将线上线下同步进行，线下会场位于中国人民大学（仅限校内师生），线上会场为腾讯会议，欢迎各位线上线下的朋友共同参会！

腾讯会议室：403996694

线下会场：明德主楼1037

会议组织

主办方