李丰

李丰博士现任中央财经大学统计与数学学院副院长、副教授、硕士生导师。博士毕业于瑞典斯德哥尔摩大学,研究领域包括贝叶斯统计学,预测方法,大数据分布式学习等。曾获瑞典皇家统计学会 Cramér 奖,国际贝叶斯学会青年奖励基金, 第二届全国高校经管类实验教学案例大赛二等奖。主持和参与多项国家自然科学基金项目。

李丰博士最新研究成果发表在统计期刊 Journal of Computational and Graphical Statistics,Journal of Business and Economic Statistics, Statistical Analysis and Data Mining,经济与管理学期刊 International Journal of Forecasting,Journal of Business Research,运筹学期刊European Journal of Operational Research, Journal of the Operational Research Society,人工智能期刊 Expert Systems with Applications,医学期刊 BMJ Open, Journal of Surgical Research, Journal of Affective Disorders等。同时著有 Bayesian Modeling of Conditional Densities,《大数据分布式计算与案例》和《统计计算》。

李丰博士在世界贝叶斯大会国际预测大会等作过邀请报告。他的报告幻灯片可以从这里下载

英文简历 | 中文简历

工作信息

中央财经大学 统计与数学学院 副院长、副教授、硕士生导师

个人网页https://feng.li/
电子邮箱feng.li@cufe.edu.cn
办公电话:+86-(0)10-6177-6189 
办公地址:中央财经大学(沙河校区)1号学院楼210房间
     北京市昌平区沙河高教园 邮编:102206

教育背景

研究兴趣

贝叶斯统计学 · 计量经济学 · 预测方法 · 大数据分布式学习

科研项目

  • 国家社会科学基金一般项目(22BTJ028):全局模型视角下的复杂分层经济预测研究。2022-今,项目负责人,在研。
  • 阿里巴巴创新研究计划:电商场景下的复杂时间序列预测问题研究。2021/09-今、项目负责人,在研。
  • 国家自然科学基金面上项目(82074282):中医药临床疗效评价中基于目标值法的单臂临床研究方法体系的构建。2021/01-今、项目主要参与人,在研。
  • 国家自然科学基金青年项目(11501587):贝叶斯柔性密度方法及其在高维金融数据中的应用。2016/01-2018/12、项目负责人,结项。
  • 教育部基金项目:贝叶斯弹性高维密度方法在复杂数据的研究。2014/01-2017/12、项目负责人,结项。
  • 国家自然科学基金青年项目(11401603):复发事件的均值模型和纵向数据的分位数回归的统计与推断。2015/01-2017/12、结项、参加。
  • 国家自然科学基金青年项目(71401192):公司财务困境预警模型研究:基于财务波动信息的区间数据刻画方法、2015/01-2017/12、结项、参加。
  • 国家自然科学基金面上项目(71473279):货币总量转向信用总量:全球虚拟经济与实体经济背离机理与宏观政策应对、2015/01-2017/12、结项、参加。

工作论文

学术发表

标星(*)为通讯作者

Matching entries: 0
settings…
  1. Xiaoqian Wang, Rob J. Hyndman, Feng Li and Yanfei Kang (2023), “Forecast combinations: an over 50-year review”, International Journal of Forecasting. (In Press)
    Abstract: Forecast combinations have flourished remarkably in the forecasting community and, in recent years, have become part of the mainstream of forecasting research and activities. Combining multiple forecasts produced from single (target) series is now widely used to improve accuracy through the integration of information gleaned from different sources, thereby mitigating the risk of identifying a single “best” forecast. Combination schemes have evolved from simple combination methods without estimation, to sophisticated methods involving time-varying weights, nonlinear combinations, correlations among components, and cross-learning. They include combining point forecasts and combining probabilistic forecasts. This paper provides an up-to-date review of the extensive literature on forecast combinations, together with reference to available open-source software implementations. We discuss the potential and limitations of various methods and highlight how these ideas have developed over time. Some important issues concerning the utility of forecast combinations are also surveyed. Finally, we conclude with current research gaps and potential insights for future research.
    BibTeX:
    @article{wang2023forecast_ijf,
      author = {Xiaoqian Wang and Rob J Hyndman and Feng Li and Yanfei Kang},
      title = {Forecast combinations: an over 50-year review},
      journal = {International Journal of Forecasting},
      year = {2023},
      number = {In Press},
      url = {http://arxiv.org/abs/2205.04216}
    }
    
  2. Li Li, Yanfei Kang, Fotios Petropoulos and Feng Li (2023), “Feature-based intermittent demand forecast combinations: accuracy and inventory implications”, International Journal of Production Research. (In Press)
    Abstract: Intermittent demand forecasting is a ubiquitous and challenging problem in production systems and supply chain management. In recent years, there has been a growing focus on developing forecasting approaches for intermittent demand from academic and practical perspectives. However, limited attention has been given to forecast combination methods, which have achieved competitive performance in forecasting fast-moving time series. The current study aims to examine the empirical outcomes of some existing forecast combination methods and propose a generalized feature-based framework for intermittent demand forecasting. The proposed framework has been shown to improve the accuracy of point and quantile forecasts based on two real data sets. Further, some analysis of features, forecasting pools and computational efficiency is also provided. The findings indicate the intelligibility and flexibility of the proposed approach in intermittent demand forecasting and offer insights regarding inventory decisions.
    BibTeX:
    @article{li2022feature_ijpr,
      author = {Li Li and Yanfei Kang and Fotios Petropoulos and Feng Li},
      title = {Feature-based intermittent demand forecast combinations: accuracy and inventory implications},
      journal = {International Journal of Production Research},
      year = {2023},
      number = {In Press},
      url = {https://arxiv.org/abs/2204.08283}
    }
    
  3. Bohan Zhang, Yanfei Kang, Anastasios Panagiotelis and Feng Li (2023), “Optimal reconciliation with immutable forecasts”, European Journal of Operational Research. (In Press)
    Abstract: The practical importance of coherent forecasts in hierarchical forecasting has inspired many studies on forecast reconciliation. Under this approach, base forecasts are produced for every series in the hierarchy and are subsequently adjusted to be coherent in a second reconciliation step. Reconciliation methods have been shown to improve forecast accuracy but will generally adjust the base forecast of every series. However, in an operational context, it is sometimes necessary or beneficial to keep forecasts of some variables unchanged after forecast reconciliation. In this paper, we formulate a reconciliation methodology that keeps forecasts of a pre-specified subset of variables unchanged or “immutable”. In contrast to existing approaches, these immutable forecasts need not all come from the same level of a hierarchy, and our method can also be applied to grouped hierarchies. We prove that our approach preserves unbiasedness in base forecasts. Our method can also account for correlations between base forecasting errors and ensure the non-negativity of forecasts. We also perform empirical experiments, including an application to a large-scale online retailer’s sales, to assess our proposed methodology’s impacts.
    BibTeX:
    @article{zhang2023optimal_ejor,
      author = {Bohan Zhang and Yanfei Kang and Anastasios Panagiotelis and Feng Li},
      title = {Optimal reconciliation with immutable forecasts},
      journal = {European Journal of Operational Research},
      year = {2023},
      number = {In Press},
      url = {http://arxiv.org/abs/2204.09231},
      doi = {10.1016/j.ejor.2022.11.035}
    }
    
  4. Rui Pan, Tunan Ren, Baishan Guo, Feng Li, Guodong Li and Hansheng Wang (2022), “A Note on Distributed Quantile Regression by Pilot Sampling and One-Step Updating”, Journal of Business and Economic Statistics. Vol. 40(4), pp. 1691-1700.
    Abstract: Quantile regression is a method of fundamental importance. How to efficiently conduct quantile regression for a large dataset on a distributed system is of great importance. We show that the popularly used one-shot estimation is statistically inefficient if data are not randomly distributed across different workers. To fix the problem, a novel one-step estimation method is developed with the following nice properties. First, the algorithm is communication efficient. That is the communication cost demanded is practically acceptable. Second, the resulting estimator is statistically efficient. That is its asymptotic covariance is the same as that of the global estimator. Third, the estimator is robust against data distribution. That is its consistency is guaranteed even if data are not randomly distributed across different workers. Numerical experiments are provided to corroborate our findings. A real example is also presented for illustration.
    BibTeX:
    @article{pan2022note_jbes,
      author = {Rui Pan and Tunan Ren and Baishan Guo and Feng Li and Guodong Li and Hansheng Wang},
      title = {A Note on Distributed Quantile Regression by Pilot Sampling and One-Step Updating},
      journal = {Journal of Business and Economic Statistics},
      year = {2022},
      volume = {40},
      number = {4},
      pages = {1691--1700},
      doi = {10.1080/07350015.2021.1961789}
    }
    
  5. Xiaoqian Wang, Yanfei Kang, Fotios Petropoulos and Feng Li (2022), “The uncertainty estimation of feature-based forecast combinations”, Journal of the Operational Research Society. Vol. 73(5), pp. 979-993.
    Abstract: Forecasting is an indispensable element of operational research (OR) and an important aid to planning. The accurate estimation of the forecast uncertainty facilitates several operations management activities, predominantly in supporting decisions in inventory and supply chain management and effectively setting safety stocks. In this paper, we introduce a feature-based framework, which links the relationship between time series features and the interval forecasting performance into providing reliable interval forecasts. We propose an optimal threshold ratio searching algorithm and a new weight determination mechanism for selecting an appropriate subset of models and assigning combination weights for each time series tailored to the observed features. We evaluate our approach using a large set of time series from the M4 competition. Our experiments show that our approach significantly outperforms a wide range of benchmark models, both in terms of point forecasts as well as prediction intervals.
    BibTeX:
    @article{wang2022uncertainty_jors,
      author = {Wang, Xiaoqian and Kang, Yanfei and Petropoulos, Fotios and Li, Feng},
      title = {The uncertainty estimation of feature-based forecast combinations},
      journal = {Journal of the Operational Research Society},
      year = {2022},
      volume = {73},
      number = {5},
      pages = {979--993},
      url = {https://arxiv.org/abs/1908.02891},
      doi = {10.1080/01605682.2021.1880297}
    }
    
  6. Zhiru Wang, Yu Pang, Mingxin Gan, Martin Skitmore and Feng Li (2022), “Escalator accident mechanism analysis and injury prediction approaches in heavy capacity metro rail transit stations”, Safety Science. Vol. 154, pp. 105850.
    Abstract: The semi-open character with high passenger flow in Metro Rail Transport Stations (MRTS) makes safety management of human-electromechanical interaction escalator systems more complex. Safety management should not consider only single failures, but also the complex interactions in the system. This study applies task driven behavior theory and system theory to reveal a generic framework of the MRTS escalator accident mechanism and uses Lasso-Logistic Regression (LLR) for escalator injury prediction. Escalator accidents in the Beijing MRTS are used as a case study to estimate the applicability of the methodologies. The main results affirm that the application of System-Theoretical Process Analysis (STPA) and Task Driven Accident Process Analysis (TDAPA) to the generic escalator accident mechanism reveals non-failure state task driven passenger behaviors and constraints on safety that are not addressed in previous studies. The results also confirm that LLR is able to predict escalator accidents where there is a relatively large number of variables with limited observations. Additionally, increasing the amount of data improves the prediction accuracy for all three types of injuries in the case study, suggesting the LLR model has good extrapolation ability. The results can be applied in MRTS as instruments for both escalator accident investigation and accident prevention.
    BibTeX:
    @article{wang2022escalator_safety,
      author = {Wang, Zhiru and Pang, Yu and Gan, Mingxin and Skitmore, Martin and Li, Feng},
      title = {Escalator accident mechanism analysis and injury prediction approaches in heavy capacity metro rail transit stations},
      journal = {Safety Science},
      year = {2022},
      volume = {154},
      pages = {105850},
      doi = {10.1016/j.ssci.2022.105850}
    }
    
  7. Li Li, Yanfei Kang and Feng Li (2022), “Bayesian forecast combination using time-varying features”, International Journal of Forecasting. (In Press)
    Abstract: In this work, we propose a novel framework for density forecast combination by constructing time-varying weights based on time series features, which is called Feature-based Bayesian Forecasting Model Averaging (FEBAMA). Our framework estimates weights in the forecast combination via Bayesian log predictive scores, in which the optimal forecasting combination is determined by time series features from historical information. In particular, we use an automatic Bayesian variable selection method to add weight to the importance of different features. To this end, our approach has better interpretability compared to other black-box forecasting combination schemes. We apply our framework to stock market data and M3 competition data. Based on our structure, a simple maximum-a-posteriori scheme outperforms benchmark methods, and Bayesian variable selection can further enhance the accuracy for both point and density forecasts.
    BibTeX:
    @article{li2022bayesian_ijf,
      author = {Li, Li and Kang, Yanfei and Li, Feng},
      title = {Bayesian forecast combination using time-varying features},
      journal = {International Journal of Forecasting},
      year = {2022},
      number = {In Press},
      url = {https://arxiv.org/abs/2108.02082},
      doi = {10.1016/j.ijforecast.2022.06.002}
    }
    
  8. Xiaoqian Wang, Yanfei Kang, Rob J. Hyndman and Feng Li (2022), “Distributed ARIMA models for ultra-long time series”, International Journal of Forecasting. (In Press)
    Abstract: Providing forecasts for ultra-long time series plays a vital role in various activities, such as investment decisions, industrial production arrangements, and farm management. This paper develops a novel distributed forecasting framework to tackle challenges associated with forecasting ultra-long time series by using the industry-standard MapReduce framework. The proposed model combination approach facilitates distributed time series forecasting by combining the local estimators of time series models delivered from worker nodes and minimizing a global loss function. In this way, instead of unrealistically assuming the data generating process (DGP) of an ultra-long time series stays invariant, we make assumptions only on the DGP of subseries spanning shorter time periods. We investigate the performance of the proposed approach with AutoRegressive Integrated Moving Average (ARIMA) models using the real data application as well as numerical simulations. Compared to directly fitting the whole data with ARIMA models, our approach results in improved forecasting accuracy and computational efficiency both in point forecasts and prediction intervals, especially for longer forecast horizons. Moreover, we explore some potential factors that may affect the forecasting performance of our approach.
    BibTeX:
    @article{wang2022distributed_ijf,
      author = {Wang, Xiaoqian and Kang, Yanfei and Hyndman, Rob J and Li, Feng},
      title = {Distributed ARIMA models for ultra-long time series},
      journal = {International Journal of Forecasting},
      year = {2022},
      number = {In Press},
      url = {https://arxiv.org/abs/2007.09577},
      doi = {10.1016/j.ijforecast.2022.05.001}
    }
    
  9. Matthias Anderer and Feng Li (2022), “Hierarchical forecasting with a top-down alignment of independent level forecasts”, International Journal of Forecasting. Vol. 38(4), pp. 1405-1414.
    Abstract: Hierarchical forecasting with intermittent time series is a challenge in both research and empirical studies. Extensive research focuses on improving the accuracy of each hierarchy, especially the intermittent time series at bottom levels. Then, hierarchical reconciliation can be used to improve the overall performance further. In this paper, we present a hierarchical-forecasting-with-alignment approach that treats the bottom-level forecasts as mutable to ensure higher forecasting accuracy on the upper levels of the hierarchy. We employ a pure deep learning forecasting approach, N-BEATS, for continuous time series at the top levels, and a widely used tree-based algorithm, LightGBM, for intermittent time series at the bottom level. The hierarchical-forecasting-with-alignment approach is a simple yet effective variant of the bottom-up method, accounting for biases that are difficult to observe at the bottom level. It allows suboptimal forecasts at the lower level to retain a higher overall performance. The approach in this empirical study was developed by the first author during the M5 Accuracy competition, ranking second place. The method is also business orientated and can be used to facilitate strategic business planning.
    BibTeX:
    @article{anderer2022forecasting_ijf,
      author = {Matthias Anderer and Feng Li},
      title = {Hierarchical forecasting with a top-down alignment of independent level forecasts},
      journal = {International Journal of Forecasting},
      year = {2022},
      volume = {38},
      number = {4},
      pages = {1405--1414},
      url = {https://arxiv.org/abs/2103.08250},
      doi = {10.1016/j.ijforecast.2021.12.015}
    }
    
  10. Fotios Petropoulos, Daniele Apiletti, Vassilios Assimakopoulos, Mohamed Zied Babai, Devon K. Barrow, Souhaib Ben Taieb, Christoph Bergmeir, Ricardo J. Bessa, Jakub Bijak, John E. Boylan, Jethro Browell, Claudio Carnevale, Jennifer L. Castle, Pasquale Cirillo, Michael P. Clements, Clara Cordeiro, Fernando Luiz Cyrino Oliveira, Shari De Baets, Alexander Dokumentov, Joanne Ellison, Piotr Fiszeder, Philip Hans Franses, David T. Frazier, Michael Gilliland, M. Sinan Gönül, Paul Goodwin, Luigi Grossi, Yael Grushka-Cockayne, Mariangela Guidolin, Massimo Guidolin, Ulrich Gunter, Xiaojia Guo, Renato Guseo, Nigel Harvey, David F. Hendry, Ross Hollyman, Tim Januschowski, Jooyoung Jeon, Victor Richmond R. Jose, Yanfei Kang, Anne B. Koehler, Stephan Kolassa, Nikolaos Kourentzes, Sonia Leva, Feng Li, Konstantia Litsiou, Spyros Makridakis, Gael M. Martin, Andrew B. Martinez, Sheik Meeran, Theodore Modis, Konstantinos Nikolopoulos, Dilek Önkal, Alessia Paccagnini, Anastasios Panagiotelis, Ioannis Panapakidis, Jose M. Pavía, Manuela Pedio, Diego J. Pedregal, Pierre Pinson, Patrícia Ramos, David E. Rapach, J. James Reade, Bahman Rostami-Tabar, Michał Rubaszek, Georgios Sermpinis, Han Lin Shang, Evangelos Spiliotis, Aris A. Syntetos, Priyanga Dilini Talagala, Thiyanga S. Talagala, Len Tashman, Dimitrios Thomakos, Thordis Thorarinsdottir, Ezio Todini, Juan Ramón Trapero Arenas, Xiaoqian Wang, Robert L. Winkler, Alisa Yusupova and Florian Ziel (2022), “Forecasting: theory and practice”, International Journal of Forecasting. Vol. 38(3), pp. 705-871.
    Abstract: Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.
    BibTeX:
    @article{petropoulos2021forecasting_ijf,
      author = {Fotios Petropoulos and Daniele Apiletti and Vassilios Assimakopoulos and Mohamed Zied Babai and Devon K. Barrow and Souhaib Ben Taieb and Christoph Bergmeir and Ricardo J. Bessa and Jakub Bijak and John E. Boylan and Jethro Browell and Claudio Carnevale and Jennifer L. Castle and Pasquale Cirillo and Michael P. Clements and Clara Cordeiro and Fernando Luiz Cyrino Oliveira and Shari De Baets and Alexander Dokumentov and Joanne Ellison and Piotr Fiszeder and Philip Hans Franses and David T. Frazier and Michael Gilliland and M. Sinan Gönül and Paul Goodwin and Luigi Grossi and Yael Grushka-Cockayne and Mariangela Guidolin and Massimo Guidolin and Ulrich Gunter and Xiaojia Guo and Renato Guseo and Nigel Harvey and David F. Hendry and Ross Hollyman and Tim Januschowski and Jooyoung Jeon and Victor Richmond R. Jose and Yanfei Kang and Anne B. Koehler and Stephan Kolassa and Nikolaos Kourentzes and Sonia Leva and Feng Li and Konstantia Litsiou and Spyros Makridakis and Gael M. Martin and Andrew B. Martinez and Sheik Meeran and Theodore Modis and Konstantinos Nikolopoulos and Dilek Önkal and Alessia Paccagnini and Anastasios Panagiotelis and Ioannis Panapakidis and Jose M. Pavía and Manuela Pedio and Diego J. Pedregal and Pierre Pinson and Patrícia Ramos and David E. Rapach and J. James Reade and Bahman Rostami-Tabar and Michał Rubaszek and Georgios Sermpinis and Han Lin Shang and Evangelos Spiliotis and Aris A. Syntetos and Priyanga Dilini Talagala and Thiyanga S. Talagala and Len Tashman and Dimitrios Thomakos and Thordis Thorarinsdottir and Ezio Todini and Juan Ramón Trapero Arenas and Xiaoqian Wang and Robert L. Winkler and Alisa Yusupova and Florian Ziel},
      title = {Forecasting: theory and practice},
      journal = {International Journal of Forecasting},
      year = {2022},
      volume = {38},
      number = {3},
      pages = {705--871},
      url = {https://arxiv.org/abs/2012.03854},
      doi = {10.1016/j.ijforecast.2021.11.001}
    }
    
  11. Thiyanga S. Talagala, Feng Li and Yanfei Kang (2022), “FFORMPP: Feature-based forecast model performance prediction”, International Journal of Forecasting. Vol. 38(3), pp. 920-943.
    Abstract: This paper introduces a novel meta-learning algorithm for time series forecast model performance prediction. We model the forecast error as a function of time series features calculated from historical time series with an efficient Bayesian multivariate surface regression approach. The minimum predicted forecast error is then used to identify an individual model or a combination of models to produce the final forecasts. It is well known that the performance of most meta-learning models depends on the representativeness of the reference dataset used for training. In such circumstances, we augment the reference dataset with a feature-based time series simulation approach, namely GRATIS, to generate a rich and representative time series collection. The proposed framework is tested using the M4 competition data and is compared against commonly used forecasting approaches. Our approach provides comparable performance to other model selection and combination approaches but at a lower computational cost and a higher degree of interpretability, which is important for supporting decisions. We also provide useful insights regarding which forecasting models are expected to work better for particular types of time series, the intrinsic mechanisms of the meta-learners, and how the forecasting performance is affected by various factors.
    BibTeX:
    @article{talagala2022fformpp_ijf,
      author = {Talagala, Thiyanga S and Li, Feng and Kang, Yanfei},
      title = {FFORMPP: Feature-based forecast model performance prediction},
      journal = {International Journal of Forecasting},
      year = {2022},
      volume = {38},
      number = {3},
      pages = {920--943},
      url = {https://arxiv.org/abs/1908.11500},
      doi = {10.1016/j.ijforecast.2021.07.002}
    }
    
  12. Yanfei Kang, Wei Cao, Fotios Petropoulos and Feng Li (2022), “Forecast with Forecasts: Diversity Matters”, European Journal of Operational Research. Vol. 31(1), pp. 180-190.
    Abstract: Forecast combinations have been widely applied in the last few decades to improve forecasting. Estimating optimal weights that can outperform simple averages is not always an easy task. In recent years, the idea of using time series features for forecast combinations has flourished. Although this idea has been proved to be beneficial in several forecasting competitions, it may not be practical in many situations. For example, the task of selecting appropriate features to build forecasting models is often challenging. Even if there was an acceptable way to define the features, existing features are estimated based on the historical patterns, which are likely to change in the future. Other times, the estimation of the features is infeasible due to limited historical data. In this work, we suggest a change of focus from the historical data to the produced forecasts to extract features. We use out-of-sample forecasts to obtain weights for forecast combinations by amplifying the diversity of the pool of methods being combined. A rich set of time series is used to evaluate the performance of the proposed method. Experimental results show that our diversity-based forecast combination framework not only simplifies the modeling process but also achieves superior forecasting performance in terms of both point forecasts and prediction intervals. The value of our proposition lies on its simplicity, transparency, and computational efficiency, elements that are important from both an optimization and a decision analysis perspective.
    BibTeX:
    @article{kang2022forecast_ejor,
      author = {Kang, Yanfei and Cao, Wei and Petropoulos, Fotios and Li, Feng},
      title = {Forecast with Forecasts: Diversity Matters},
      journal = {European Journal of Operational Research},
      year = {2022},
      volume = {31},
      number = {1},
      pages = {180--190},
      url = {https://arxiv.org/abs/2012.01643},
      doi = {10.1016/j.ejor.2021.10.024}
    }
    
  13. Xuening Zhu, Feng Li and Hansheng Wang (2021), “Least-Square Approximation for a Distributed System”, Journal of Computational and Graphical Statistics. Vol. 30(4), pp. 1004-1018.
    Abstract: In this work, we develop a distributed least-square approximation (DLSA) method that is able to solve a large family of regression problems (e.g., linear regression, logistic regression, and Cox’s model) on a distributed system. By approximating the local objective function using a local quadratic form, we are able to obtain a combined estimator by taking a weighted average of local estimators. The resulting estimator is proved to be statistically as efficient as the global estimator. Moreover, it requires only one round of communication. We further conduct a shrinkage estimation based on the DLSA estimation using an adaptive Lasso approach. The solution can be easily obtained by using the LARS algorithm on the master node. It is theoretically shown that the resulting estimator possesses the oracle property and is selection consistent by using a newly designed distributed Bayesian information criterion. The finite sample performance and computational efficiency are further illustrated by an extensive numerical study and an airline dataset. The airline dataset is 52 GB in size. The entire methodology has been implemented in Python for a de-facto standard Spark system. The proposed DLSA algorithm on the Spark system takes 26 min to obtain a logistic regression estimator, which is more efficient and memory friendly than conventional methods. Supplementary materials for this article are available online.
    BibTeX:
    @article{zhu2021least_jcgs,
      author = {Zhu, Xuening and Li, Feng and Wang, Hansheng},
      title = {Least-Square Approximation for a Distributed System},
      journal = {Journal of Computational and Graphical Statistics},
      year = {2021},
      volume = {30},
      number = {4},
      pages = {1004--1018},
      url = {https://arxiv.org/abs/1908.04904},
      doi = {10.1080/10618600.2021.1923517}
    }
    
  14. Yanfei Kang, Evangelos Spiliotis, Fotios Petropoulos, Nikolaos Athiniotis, Feng Li and Vassilios Assimakopoulos (2021), “Déjà vu: A data-centric forecasting approach through time series cross-similarity”, Journal of Business Research. Vol. 132(2021), pp. 719-731.
    Abstract: Accurate forecasts are vital for supporting the decisions of modern companies. Forecasters typically select the most appropriate statistical model for each time series. However, statistical models usually presume some data generation process while making strong assumptions about the errors. In this paper, we present a novel data-centric approach — ‘forecasting with cross-similarity’, which tackles model uncertainty in a model-free manner. Existing similarity-based methods focus on identifying similar patterns within the series, i.e., ‘self-similarity’. In contrast, we propose searching for similar patterns from a reference set, i.e., ‘cross-similarity’. Instead of extrapolating, the future paths of the similar series are aggregated to obtain the forecasts of the target series. Building on the cross-learning concept, our approach allows the application of similarity-based forecasting on series with limited lengths. We evaluate the approach using a rich collection of real data and show that it yields competitive accuracy in both points forecasts and prediction intervals.
    BibTeX:
    @article{kang2021deja_jbr,
      author = {Kang, Yanfei and Spiliotis, Evangelos and Petropoulos, Fotios and Athiniotis, Nikolaos and Li, Feng and Assimakopoulos, Vassilios},
      title = {Déjà vu: A data-centric forecasting approach through time series cross-similarity},
      journal = {Journal of Business Research},
      year = {2021},
      volume = {132},
      number = {2021},
      pages = {719--731},
      url = {https://arxiv.org/abs/1909.00221},
      doi = {10.1016/j.jbusres.2020.10.051}
    }
    
  15. Megan G. Janeway, Xiang Zhao, Max Rosenthaler, Yi Zuo, Kumar Balasubramaniyan, Michael Poulson, Miriam Neufeld, Jeffrey J. Siracuse, Courtney E. Takahashi, Lisa Allee, Tracey Dechert, Peter A. Burke, Feng Li and Bindu Kalesan (2021), “Clinical diagnostic phenotypes in hospitalizations due to self-inflicted firearm injury”, Journal of Affective Disorders. Vol. 278, pp. 172-180.
    Abstract: Hospitalized self-inflicted firearm injuries have not been extensively studied, particularly regarding clinical diagnoses at the index admission. The objective of this study was to discover the diagnostic phenotypes (DPs) or clusters of hospitalized self-inflicted firearm injuries. Using Nationwide Inpatient Sample data in the US from 1993 to 2014, we used International Classification of Diseases, Ninth Revision codes to identify self-inflicted firearm injuries among those ≥18 years of age. The 25 most frequent diagnostic codes were used to compute a dissimilarity matrix and the optimal number of clusters. We used hierarchical clustering to identify the main DPs. The overall cohort included 14072 hospitalizations, with self-inflicted firearm injuries occurring mainly in those between 16 to 45 years of age, black, with co-occurring tobacco and alcohol use, and mental illness. Out of the three identified DPs, DP1 was the largest (n=10,110), and included most common diagnoses similar to overall cohort, including major depressive disorders (27.7%), hypertension (16.8%), acute post hemorrhagic anemia (16.7%), tobacco (15.7%) and alcohol use (12.6%). DP2 (n=3,725) was not characterized by any of the top 25 ICD-9 diagnoses codes, and included children and peripartum women. DP3, the smallest phenotype (n=237), had high prevalence of depression similar to DP1, and defined by fewer fatal injuries of chest and abdomen. There were three distinct diagnostic phenotypes in hospitalizations due to self-inflicted firearm injuries. Further research is needed to determine how DPs can be used to tailor clinical care and prevention efforts.
    BibTeX:
    @article{janeway2021clinical_jad,
      author = {Megan G Janeway and Xiang Zhao and Max Rosenthaler and Yi Zuo and Kumar Balasubramaniyan and Michael Poulson and Miriam Neufeld and Jeffrey J. Siracuse and Courtney E. Takahashi and Lisa Allee and Tracey Dechert and Peter A Burke and Feng Li and Bindu Kalesan},
      title = {Clinical diagnostic phenotypes in hospitalizations due to self-inflicted firearm injury},
      journal = {Journal of Affective Disorders},
      year = {2021},
      volume = {278},
      pages = {172--180},
      doi = {10.1016/j.jad.2020.09.067}
    }
    
  16. 康雁飞 and 李丰 (2020), “预测:方法与实践” 在线出版.
    BibTeX:
    @book{li2020fppcn,
      author = {康雁飞 and 李丰},
      title = {预测:方法与实践},
      publisher = {在线出版},
      year = {2020},
      url = {https://otexts.com/fppcn/}
    }
    
  17. 康雁飞 and 李丰 (2020), “统计计算” 在线出版.
    BibTeX:
    @book{kang2020statscompcn,
      author = {康雁飞 and 李丰},
      title = {统计计算},
      publisher = {在线出版},
      year = {2020},
      url = {https://feng.li/files/statscompbook/}
    }
    
  18. Bindu Kalesan, Siran Zhao, Michael Poulson, Miriam Neufeld, Tracey Dechert, Jeffrey J. Siracuse, Yi Zuo and Feng Li (2020), “Intersections of firearm suicide, drug-related mortality, and economic dependency in rural America”, Journal of Surgical Research. Vol. 256, pp. 96-102.
    Abstract: Rural counties in the United States have higher firearm suicide rates and opioid overdoses than urban counties. We sought to determine whether rural counties can be grouped based on these “diseases of despair.” Age-adjusted firearm suicide death rates per 100,000; drug-related death rates per 100,000; homicide rate per 100,000, opioid prescribing rate, %black, %Native American, and %veteran population, median home price, violent crime rates per 100,000, primary economic dependency (nonspecialized, farming, mining, manufacturing, government, and recreation), and economic variables (low education, low employment, retirement destination, persistent poverty, and persistent child poverty) were obtained for all rural counties and evaluated with hierarchical clustering using complete linkage. We identified five distinct rural county clusters. The firearm suicide rates in the clusters were 5.9, 6.8, 6.4, 8.5, and 3.8 per 100,000, respectively. The counties in cluster 1 were poor, mining dependent, with population loss, cluster 2 were nonspecialized economies, with high opioid prescription rates, cluster 3 were manufacturing and government economies with moderate unemployment, cluster 4 were recreational economies with substantial veterans and Native American populations, high median home price, drug death rates, opioid prescribing, and violent crime, and cluster 5 were farming economies, with high population loss, low median home price, low rates of drug mortality, opioid prescribing, and violent crime. Cluster 4 counties were spatially adjacent to urban counties. More than 300 counties currently face a disproportionate burden of diseases of despair. Interventions to reduce firearm suicides should be community-based and include programs to reduce other diseases of despair.
    BibTeX:
    @article{kalesan2020intersections_jsr,
      author = {Kalesan, Bindu and Zhao, Siran and Poulson, Michael and Neufeld, Miriam and Dechert, Tracey and Siracuse, Jeffrey J and Zuo, Yi and Li, Feng},
      title = {Intersections of firearm suicide, drug-related mortality, and economic dependency in rural America},
      journal = {Journal of Surgical Research},
      year = {2020},
      volume = {256},
      pages = {96--102},
      doi = {10.1016/j.jss.2020.06.011}
    }
    
  19. Xixi Li, Yanfei Kang and Feng Li (2020), “Forecasting with time series imaging”, Expert Systems with Applications. Vol. 160, pp. 113680.
    Abstract: Feature-based time series representations have attracted substantial attention in a wide range of time series analysis methods. Recently, the use of time series features for forecast model averaging has been an emerging research focus in the forecasting community. Nonetheless, most of the existing approaches depend on the manual choice of an appropriate set of features. Exploiting machine learning methods to extract features from time series automatically becomes crucial in state-of-the-art time series analysis. In this paper, we introduce an automated approach to extract time series features based on time series imaging. We first transform time series into recurrence plots, from which local features can be extracted using computer vision algorithms. The extracted features are used for forecast model averaging. Our experiments show that forecasting based on automatically extracted features, with less human intervention and a more comprehensive view of the raw time series data, yields highly comparable performances with the best methods in the largest forecasting competition dataset (M4) and outperforms the top methods in the Tourism forecasting competition dataset.
    BibTeX:
    @article{li2020forecasting_eswa,
      author = {Li, Xixi and Kang, Yanfei and Li, Feng},
      title = {Forecasting with time series imaging},
      journal = {Expert Systems with Applications},
      year = {2020},
      volume = {160},
      pages = {113680},
      url = {https://arxiv.org/abs/1904.08064},
      doi = {10.1016/j.eswa.2020.113680}
    }
    
  20. Chengcheng Hao, Feng Li and Dietrich von Rosen (2020), “A Bilinear Reduced Rank Model”, In Contemporary Experimental Design, Multivariate Analysis and Data Mining. Springer Nature.
    Abstract: This article considers a bilinear model that includes two different latent effects. The first effect has a direct influence on the response variable, whereas the second latent effect is assumed to first influence other latent variables, which in turn affect the response variable. In this article, latent variables are modelled via rank restrictions on unknown mean parameters and the models which are used are often referred to as reduced rank regression models. This article presents a likelihood-based approach that results in explicit estimators. In our model, the latent variables act as covariates that we know exist, but their direct influence is unknown and will therefore not be considered in detail. One example is if we observe hundreds of weather variables, but we cannot say which or how these variables affect plant growth.
    BibTeX:
    @inbook{hao2020bilinear_ced,
      author = {Hao, Chengcheng and Li, Feng and von Rosen, Dietrich},
      editor = {Jianqing Fan and Jianxin Pan},
      title = {A Bilinear Reduced Rank Model},
      booktitle = {Contemporary Experimental Design, Multivariate Analysis and Data Mining},
      publisher = {Springer Nature},
      year = {2020},
      doi = {10.1007/978-3-030-46161-4_21}
    }
    
  21. Yanfei Kang, Rob J. Hyndman and Feng Li (2020), “GRATIS: GeneRAting TIme Series with diverse and controllable characteristics”, Statistical Analysis and Data Mining. Vol. 13(4), pp. 354-376.
    Abstract: The explosion of time series data in recent years has brought a flourish of new time series analysis methods, for forecasting, clustering, classification and other tasks. The evaluation of these new methods requires either collecting or simulating a diverse set of time series benchmarking data to enable reliable comparisons against alternative approaches. We propose GeneRAting TIme Series with diverse and controllable characteristics, named GRATIS, with the use of mixture autoregressive (MAR) models. We simulate sets of time series using MAR models and investigate the diversity and coverage of the generated time series in a time series feature space. By tuning the parameters of the MAR models, GRATIS is also able to efficiently generate new time series with controllable features. In general, as a costless surrogate to the traditional data collection approach, GRATIS can be used as an evaluation tool for tasks such as time series forecasting and classification. We illustrate the usefulness of our time series generation process through a time series forecasting application.
    BibTeX:
    @article{kang2020gratis_sam,
      author = {Kang, Yanfei and Hyndman, Rob J and Li, Feng},
      title = {GRATIS: GeneRAting TIme Series with diverse and controllable characteristics},
      journal = {Statistical Analysis and Data Mining},
      year = {2020},
      volume = {13},
      number = {4},
      pages = {354--376},
      url = {https://arxiv.org/abs/1903.02787},
      doi = {10.1002/sam.11461}
    }
    
  22. Hannah M. Bailey, Yi Zuo, Feng Li, Jae Min, Krishna Vaddiparti, Mattia Prosperi, Jeffrey Fagan, Sandro Galea and Bindu Kalesan (2019), “Changes in patterns of mortality rates and years of life lost due to firearms in the United States, 1999 to 2016: A joinpoint analysis”, PLoS One. Vol. 14(11)
    Abstract: Firearm-related death rates and years of potential life lost (YPLL) vary widely between population subgroups and states. However, changes or inflections in temporal trends within subgroups and states are not fully documented. We assessed temporal patterns and inflections in the rates of firearm deaths and %YPLL due to firearms for overall and by sex, age, race/ethnicity, intent, and states in the United States between 1999 and 2016. We extracted age-adjusted firearm mortality and YPLL rates per 100,000, and %YPLL from 1999 to 2016 by using the WONDER (Wide-ranging Online Data for Epidemiologic Research) database. We used Joinpoint Regression to assess temporal trends, the inflection points, and annual percentage change (APC) from 1999 to 2016. National firearm mortality rates were 10.3 and 11.8 per 100,000 in 1999 and 2016, with two distinct segments; a plateau until 2014 followed by an increase of APC = 7.2% (95% CI 3.1, 11.4). YPLL rates were from 304.7 and 338.2 in 1999 and 2016 with a steady APC increase in %YPLL of 0.65% (95% CI 0.43, 0.87) from 1999 to an inflection point in 2014, followed by a larger APC in %YPLL of 5.1% (95% CI 0.1, 10.4). The upward trend in firearm mortality and YPLL rates starting in 2014 was observed in subgroups of male, non-Hispanic blacks, Hispanic whites and for firearm assaults. The inflection points for firearm mortality and YPLL rates also varied across states. Within the United States, firearm mortality rates and YPLL remained constant between 1999 and 2014 and has been increasing subsequently. There was, however, an increase in firearm mortality rates in several subgroups and individual states earlier than 2014.
    BibTeX:
    @article{bailey2019changes_plosone,
      author = {Bailey, Hannah M and Zuo, Yi and Li, Feng and Min, Jae and Vaddiparti, Krishna and Prosperi, Mattia and Fagan, Jeffrey and Galea, Sandro and Kalesan, Bindu},
      title = {Changes in patterns of mortality rates and years of life lost due to firearms in the United States, 1999 to 2016: A joinpoint analysis},
      journal = {PLoS One},
      year = {2019},
      volume = {14},
      number = {11},
      doi = {10.1371/journal.pone.0225223}
    }
    
  23. Feng Li and Zhuojing He (2019), “Credit risk clustering in a business group: which matters more, systematic or idiosyncratic risk?”, Cogent Economics & Finance. , pp. 1632528.
    Abstract: Understanding how defaults correlate across firms is a persistent concern in risk management. In this paper, we apply covariate-dependent copula models to assess the dynamic nature of credit risk dependence, which we define as “credit risk clustering”. We also study the driving forces of the credit risk clustering in CEC business group in China. Our empirical analysis shows that the credit risk clustering varies over time and exhibits different patterns across firm pairs in a business group. We also investigate the impacts of systematic and idiosyncratic factors on credit risk clustering. We find that the impacts of the money supply and the short-term interest rates are positive, whereas the impacts of exchange rates are negative. The roles of the CPI on credit risk clustering are ambiguous. Idiosyncratic factors are vital for predicting credit risk clustering. From a policy perspective, our results not only strengthen the results of previous research but also provide a possible approach to model and predict the extreme co-movement of credit risk in business groups with financial indicators.
    BibTeX:
    @article{li2019credit_cef,
      author = {Li, Feng and He, Zhuojing},
      title = {Credit risk clustering in a business group: which matters more, systematic or idiosyncratic risk?},
      journal = {Cogent Economics & Finance},
      year = {2019},
      pages = {1632528},
      url = {http://dx.doi.org/10.2139/ssrn.3182925},
      doi = {10.1080/23322039.2019.1632528}
    }
    
  24. Elizabeth C. Pino, Yi Zuo, Camila Maciel De Olivera, Shruthi Mahalingaiah, Olivia Keiser, Lynn L. Moore, Feng Li, Ramachandran S. Vasan, Barbara E. Corkey and Bindu Kalesan (2018), “Cohort profile: The MULTI sTUdy Diabetes rEsearch (MULTITUDE) consortium”, BMJ Open. Vol. 8(5), pp. e020640.
    Abstract: Globally, the age-standardised prevalence of type 2 diabetes mellitus (T2DM) has nearly doubled from 1980 to 2014, rising from 4.7 to 8.5 with an estimated 422 million adults living with the chronic disease. The MULTI sTUdy Diabetes rEsearch (MULTITUDE) consortium was recently established to harmonise data from 17 independent cohort studies and clinical trials and to facilitate a better understanding of the determinants, risk factors and outcomes associated with T2DM. Participants Participants range in age from 3 to 88 years at baseline, including both individuals with and without T2DM. MULTITUDE is an individual-level pooled database of demographics, comorbidities, relevant medications, clinical laboratory values, cardiac health measures, and T2DM-associated events and outcomes across 45 US states and the District of Columbia. Findings to date Among the 135 156 ongoing participants included in the consortium, almost 25% (33 421) were diagnosed with T2DM at baseline. The average age of the participants was 54.3%, while the average age of participants with diabetes was 64.2%. Men (55.3%) and women (44.6%) were almost equally represented across the consortium. Non-whites accounted for 31.6 of the total participants and 40% of those diagnosed with T2DM. Fewer individuals with diabetes reported being regular smokers than their non-diabetic counterparts (40.3% vs 47.4%). Over 85% of those with diabetes were reported as either overweight or obese at baseline, compared with 60.7% of those without T2DM. We observed differences in all-cause mortality, overall and by T2DM status, between cohorts. Given the wide variation in demographics and all-cause mortality in the cohorts, MULTITUDE consortium will be a unique resource for conducting research to determine: differences in the incidence and progression of T2DM; sequence of events or biomarkers prior to T2DM diagnosis; disease progression from T2DM to disease-related outcomes, complications and premature mortality; and to assess race/ethnicity differences in the above associations.
    BibTeX:
    @article{pino2018cohort_bmj,
      author = {Pino, Elizabeth C and Zuo, Yi and De Olivera, Camila Maciel and Mahalingaiah, Shruthi and Keiser, Olivia and Moore, Lynn L and Li, Feng and Vasan, Ramachandran S and Corkey, Barbara E and Kalesan, Bindu},
      title = {Cohort profile: The MULTI sTUdy Diabetes rEsearch (MULTITUDE) consortium},
      journal = {BMJ Open},
      year = {2018},
      volume = {8},
      number = {5},
      pages = {e020640},
      doi = {10.1136/bmjopen-2017-020640}
    }
    
  25. Feng Li and Yanfei Kang (2018), “Improving forecasting performance using covariate-dependent copula models”, International Journal of Forecasting. Vol. 34(3), pp. 456-476.
    Abstract: Copulas provide an attractive approach to the construction of multivariate distributions with flexible marginal distributions and different forms of dependences. Of particular importance in many areas is the possibility of forecasting the tail-dependences explicitly. Most of the available approaches are only able to estimate tail-dependences and correlations via nuisance parameters, and cannot be used for either interpretation or forecasting. We propose a general Bayesian approach for modeling and forecasting tail-dependences and correlations as explicit functions of covariates, with the aim of improving the copula forecasting performance. The proposed covariate-dependent copula model also allows for Bayesian variable selection from among the covariates of the marginal models, as well as the copula density. The copulas that we study include the Joe-Clayton copula, the Clayton copula, the Gumbel copula and the Student’s -copula. Posterior inference is carried out using an efficient MCMC simulation method. Our approach is applied to both simulated data and the S&P 100 and S&P 600 stock indices. The forecasting performance of the proposed approach is compared with those of other modeling strategies based on log predictive scores. A value-at-risk evaluation is also performed for the model comparisons.
    BibTeX:
    @article{li2018improving_ijf,
      author = {Li, Feng and Kang, Yanfei},
      title = {Improving forecasting performance using covariate-dependent copula models},
      journal = {International Journal of Forecasting},
      year = {2018},
      volume = {34},
      number = {3},
      pages = {456--476},
      url = {https://arxiv.org/abs/1401.0100},
      doi = {10.1016/j.ijforecast.2018.01.007}
    }
    
  26. 李丰 (2016), “大数据分布式计算与案例” 中国人民大学出版社.
    BibTeX:
    @book{li2016distributedcn,
      author = {李丰},
      title = {大数据分布式计算与案例},
      publisher = {中国人民大学出版社},
      year = {2016},
      url = {https://feng.li/files/distcompbook/}
    }
    
  27. Feng Li (2013), “Bayesian Modeling of Conditional Densities”. Thesis at: Department of Statistics, Stockholm University.
    Abstract: This thesis develops models and associated Bayesian inference methods for flexible univariate and multivariate conditional density estimation. The models are flexible in the sense that they can capture widely differing shapes of the data. The estimation methods are specifically designed to achieve flexibility while still avoiding overfitting. The models are flexible both for a given covariate value, but also across covariate space. A key contribution of this thesis is that it provides general approaches of density estimation with highly efficient Markov chain Monte Carlo methods. The methods are illustrated on several challenging non-linear and non-normal datasets. In the first paper, a general model is proposed for flexibly estimating the density of a continuous response variable conditional on a possibly high-dimensional set of covariates. The model is a finite mixture of asymmetric student-t densities with covariate-dependent mixture weights. The four parameters of the components, the mean, degrees of freedom, scale and skewness, are all modeled as functions of the covariates. The second paper explores how well a smooth mixture of symmetric components can capture skewed data. Simulations and applications on real data show that including covariate-dependent skewness in the components can lead to substantially improved performance on skewed data, often using a much smaller number of components. We also introduce smooth mixtures of gamma and log-normal components to model positively-valued response variables. In the third paper we propose a multivariate Gaussian surface regression model that combines both additive splines and interactive splines, and a highly efficient MCMC algorithm that updates all the multi-dimensional knot locations jointly. We use shrinkage priors to avoid overfitting with different estimated shrinkage factors for the additive and surface part of the model, and also different shrinkage parameters for the different response variables. In the last paper we present a general Bayesian approach for directly modeling dependencies between variables as function of explanatory variables in a flexible copula context. In particular, the Joe-Clayton copula is extended to have covariate-dependent tail dependence and correlations. Posterior inference is carried out using a novel and efficient simulation method. The appendix of the thesis documents the computational implementation details.
    BibTeX:
    @phdthesis{li2013bayesian,
      author = {Feng Li},
      title = {Bayesian Modeling of Conditional Densities},
      school = {Department of Statistics, Stockholm University},
      year = {2013},
      note = {ISBN: 978-91-7447-665-1},
      url = {http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-89426}
    }
    
  28. Feng Li and Mattias Villani (2013), “Efficient Bayesian Multivariate Surface Regression”, Scandinavian Journal of Statistics. Vol. 40(4), pp. 706-723.
    Abstract: Methods for choosing a fixed set of knot locations in additive spline models are fairly well established in the statistical literature. The curse of dimensionality makes it nontrivial to extend these methods to nonadditive surface models, especially when there are more than a couple of covariates. We propose a multivariate Gaussian surface regression model that combines both additive splines and interactive splines, and a highly efficient Markov chain Monte Carlo algorithm that updates all the knot locations jointly. We use shrinkage prior to avoid overfitting with different estimated shrinkage factors for the additive and surface part of the model, and also different shrinkage parameters for the different response variables. Simulated data and an application to firm leverage data show that the approach is computationally efficient, and that allowing for freely estimated knot locations can offer a substantial improvement in out-of-sample predictive performance.
    BibTeX:
    @article{li2013efficient_sjs,
      author = {Li, Feng and Villani, Mattias},
      title = {Efficient Bayesian Multivariate Surface Regression},
      journal = {Scandinavian Journal of Statistics},
      year = {2013},
      volume = {40},
      number = {4},
      pages = {706--723},
      url = {https://arxiv.org/abs/1110.3689},
      doi = {10.1111/sjos.12022}
    }
    
  29. Feng Li, Mattias Villani and Robert Kohn (2011), “Modeling Conditional Densities Using Finite Smooth Mixtures”, In Mixtures: estimation and applications. , pp. 123-144. John Wiley & Sons Inc, Chichester.
    Abstract: Smooth mixtures, i.e. mixture models with covariate-dependent mixing weights, are very useful flexible models for conditional densities. Previous work shows that using too simple mixture components for modeling heteroscedastic and/or heavy tailed data can give a poor fit, even with a large number of components. This paper explores how well a smooth mixture of symmetric components can capture skewed data. Simulations and applications on real data show that including covariate-dependent skewness in the components can lead to substantially improved performance on skewed data, often using a much smaller number of components. Furthermore, variable selection is effective in removing unnecessary covariates in the skewness, which means that there is little loss in allowing for skewness in the components when the data are actually symmetric. We also introduce smooth mixtures of gamma and log-normal components to model positively-valued response variables.
    BibTeX:
    @inbook{li2011modeling_mixtures,
      author = {Li, Feng and Villani, Mattias and Kohn, Robert},
      editor = {Mengersen, Kerrie and Robert, Christian and Titterington, Mike},
      title = {Modeling Conditional Densities Using Finite Smooth Mixtures},
      booktitle = {Mixtures: estimation and applications},
      publisher = {John Wiley & Sons Inc, Chichester},
      year = {2011},
      pages = {123--144},
      url = {http://dx.doi.org/10.2139/ssrn.1711194},
      doi = {10.1002/9781119995678.ch6}
    }
    
  30. Feng Li, Mattias Villani and Robert Kohn (2010), “Flexible modeling of conditional distributions using smooth mixtures of asymmetric student t densities”, Journal of Statistical Planning and Inference. Vol. 140(12), pp. 3638-3654.
    Abstract: A general model is proposed for flexibly estimating the density of a continuous response variable conditional on a possibly high-dimensional set of covariates. The model is a finite mixture of asymmetric student t densities with covariate-dependent mixture weights. The four parameters of the components, the mean, degrees of freedom, scale and skewness, are all modeled as functions of the covariates. Inference is Bayesian and the computation is carried out using Markov chain Monte Carlo simulation. To enable model parsimony, a variable selection prior is used in each set of covariates and among the covariates in the mixing weights. The model is used to analyze the distribution of daily stock market returns, and shown to more accurately forecast the distribution of returns than other widely used models for financial data.
    BibTeX:
    @article{li2010flexible_jspi,
      author = {Li, Feng and Villani, Mattias and Kohn, Robert},
      title = {Flexible modeling of conditional distributions using smooth mixtures of asymmetric student t densities},
      journal = {Journal of Statistical Planning and Inference},
      year = {2010},
      volume = {140},
      number = {12},
      pages = {3638--3654},
      url = {http://dx.doi.org/10.2139/ssrn.1551195},
      doi = {10.1016/j.jspi.2010.04.031}
    }
    

报告与访谈

Slides are available from https://github.com/feng-li/talks.

Time
(时间)
Venue (地点)Topic (主题)
2021-11-14Forecasting Impact PodcastComputing, forecasting and learning with massive machines
2021-11-20第14届中国R会议软件工具专场Developing Distributed Models with Spark
2021-10-24首都师范大学海量数据驱动场景及其数据科学方法
2021-09-13Data Skeptic Podcast Distributed ARIMA Models for Ultra-long Time Series
2021-09-06狗熊会复杂数据的高可延展分布式建模与计算机实现
2021-07-02The 2021 World Meeting of the International Society for Bayesian AnalysisDistributed Forecasting with Large Bayesian Vector Autoregressions
2021-06-28The 41st International Symposium on ForecastingHighly scalable distributed modelling and forecasting with dependent data
2020-10-26The 40th International Symposium on ForecastingFeature-based Bayesian Forecast Model Averaging
2019-07-08The 12th International Conference on Monte Carlo Methods and Applications, Sydney, AustraliaBayesian high-dimensional covariate-dependent copula modeling with application to stocks and text sentiments
2019-06-17The 39th International Symposium on Forecasting, Thessaloniki, GreeceTime series forecasting based on automatic feature extraction
2016-11-21统计之都COS 访谈第 22 期: 李丰老师
2014-10-25Stockholm UniversityInterview with Feng Li, PhD
2014-06Qvintensen ArticleComplex Model for Complex Data via the Bayesian Approach
2014-03-22The Swedish Cramér Society 2014 Annual MeetingBayesian Modeling of Conditional Densities

期刊审稿

担任 Journal of Business and Economic Statistics, International Journal of Forecasting, Computational Statistics and Data Analysis, Pattern Recognition, Neurocomputing 等期刊审稿人。

组织会议

  • The 2017 Beijing Workshop on Forecasting
  • 中国数量经济学会2016年年会
  • 2014年金融工程与风险管理国际研讨会

学术奖励

  • 第二届全国高校经管类实验教学案例大赛二等奖,2017年12月。
  • 瑞典皇家统计学会 Cramér 奖(最佳博士奖), 2014 年 3 月。
  • 国际贝叶斯学会青年奖励基金, 2012 年 6 月。
  • 瑞典 Knut & Alice Wallenberg 基金奖励, 2011 年 8 月。
  • 北京市级优秀毕业生,2007 年 7 月。