Award Abstract # 1916239
Offline and Online Change-point Analysis for Large-scale Time Series Data

NSF Org: DMS
Division Of Mathematical Sciences
Recipient: KENT STATE UNIVERSITY
Initial Amendment Date: August 9, 2019
Latest Amendment Date: August 18, 2021
Award Number: 1916239
Award Instrument: Continuing Grant
Program Manager: Yong Zeng
yzeng@nsf.gov
 (703)292-7299
DMS
 Division Of Mathematical Sciences
MPS
 Directorate for Mathematical and Physical Sciences
Start Date: September 1, 2019
End Date: August 31, 2023 (Estimated)
Total Intended Award Amount: $100,000.00
Total Awarded Amount to Date: $100,000.00
Funds Obligated to Date: FY 2019 = $34,967.00
FY 2020 = $32,066.00

FY 2021 = $32,967.00
History of Investigator:
  • Jun Li (Principal Investigator)
    jli49@kent.edu
Recipient Sponsored Research Office: Kent State University
1500 HORNING RD
KENT
OH  US  44242-0001
(330)672-2070
Sponsor Congressional District: 14
Primary Place of Performance: Kent State University
Kent
OH  US  44242-0001
Primary Place of Performance
Congressional District:
14
Unique Entity Identifier (UEI): KXNVA7JCC5K6
Parent UEI:
NSF Program(s): STATISTICS
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 126900
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

Offline or online time series data often involve change points due to the dynamic behavior of the monitored systems. Identifying change points from offline time series data makes parameter estimation and statistical inference efficient by pooling homogeneous observations. Detection of change points from online time series data provides timely snapshots of the monitored system and allows for real-time anomaly detection. Despite its importance, methods available for detecting change points in large-scale offline and online time series data are scarce. This is because a large number of parameters cannot be estimated accurately with a limited number of observations, and parametric models do not fully capture multifarious aspects of data dependence. This project will develop new non-parametric change-point detection methods that incorporate both spatial and temporal dependence without imposing restrictive structural assumptions on large-scale time series data. The proposed methods will span a wide range of topics in applications, including identifying significant genes associated with certain diseases, studying dynamic functional connectivity in resting-state functional magnetic resonance imaging data, and detecting abrupt events such as dissociation of communities, or formation of new communities from social networking platforms. This project will integrate research and education by involving students at different levels, including those from underrepresented groups, and by training the pre-college and high school teachers to improve their knowledge in statistics through new developed courses. The developed methods will be disseminated to biomedical and social scientists through interdisciplinary collaborations and the analysis of first-hand datasets.

This project will develop a general factor model framework for spatial and temporal dependence of large-scale time series data. By integrating the framework, this project will provide hypothesis testing and offline change-point estimation of specific parameters, including the population mean and covariance matrix. The proposed methods can be readily modified to incorporate the advantages of both sum-of-squares-norm and max-norm statistics for hypothesis testing. They can be extended from regular binary segmentation methods to other popular change-point estimation methods, such as circular binary segmentation and wild binary segmentation. This project will also provide new stopping rules for online change-point detection of large-scale time series data. An explicit expression for the average run length (ARL) will be derived, so that the level of threshold in stopping rules can be easily obtained with no need to run time-consuming Monte Carlo simulations. The proposed research will derive an upper bound for the expected detection delay (EDD), the expression of which clearly demonstrates the impact of data dimensionality and dependence. This project will extend the current knowledge about change-point detection. For offline change-point detection, the PI will study the possibility of estimating the change point near the boundary in high dimensional settings. For online change-point detection, a comparison will be made between the stopping rule based on the sum-of-squares-norm statistic and the one based on the max-norm statistic, through the derived ARLs and EDDs.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Li, Jun "Finite sample t-tests for high-dimensional means" Journal of Multivariate Analysis , v.196 , 2023 https://doi.org/10.1016/j.jmva.2023.105183 Citation Details
Li, Lingjun and Li, Jun "Online Change-Point Detection in High-Dimensional Covariance Structure with Application to Dynamic Networks" Journal of machine learning research , 2023 Citation Details
Zhong, PingShou and Li, Jun and Kokoszka, Piotr "Multivariate analysis of variance and change points estimation for highdimensional longitudinal data" Scandinavian Journal of Statistics , 2020 https://doi.org/10.1111/sjos.12460 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

During the grant support period, the PI has made significant progress in the field of change-point analysis for large-scale time series data. The developed methodology covers both offline and online scenarios, accommodating complex spatial and temporal dependencies, as well as addressing challenges presented by the "large p, small n" situation.

Key Contributions: 

1.    Non-parametric Factor Model: The PI has successfully formulated a non-parametric factor model capable of incorporating spatial and temporal dependencies in large-scale offline and online time series data.

2.    Offline Change-point Testing and Estimation: The PI has developed novel techniques for hypothesis testing and change-point estimation, specifically targeting changes in parameters, including the population mean and covariance matrix in large-scale offline data.

3.    Online Change-point Detection: The PI has introduced distribution-free stopping rules for online change-point detection in the mean and covariance matrix of high-dimensional data. These procedures consider temporospatial dependence, apply to non-Gaussian data, and yield explicit expressions for average run length (ARL) and expected detection delay (EDD).

Published Work and Software: The research work has resulted in several published and under-review works. It also resulted in the development of the R package "onlineCOV," facilitating online change-point detection in high-dimensional covariance structures.

Education and Training Impact: During the grant support period, the PI provided funding and mentorship for a Ph.D. student, who successfully defended a thesis titled "Statistical Inference for Change Points in High-dimensional Offline and Online Data" in 2020.

Additionally, the PI provided training opportunities for undergraduate female students through the REU Kent State University program, helping their development in research and presentation skills.

The PI's contribution extended to education with a lecture in the newly developed course "Intelligent Image Analysis and Management" in summer 2021. This course equipped undergraduate students with modern skills for analyzing high-dimensional time series data.

Collaborations and Future Initiatives: The PI actively collaborated with biologists and computer scientists to create automated systems for analyzing big multimodal biomedical data. Furthermore, a collaboration with researchers from the Cognitive Robotics and AI Lab at Kent State University focused on implementing and applying newly developed online change-point detection algorithms to a robotic system.

Impact on Curriculum Development: The PI's home department is currently developing a Master’s Degree program in data science. The project is likely to provide opportunities for developing new courses and curriculum content to meet the current demands from both non-math and math major students.

In summary, the PI's project has significantly advanced the field of change-point analysis, providing novel methodologies, publishing impactful research, contributing to education and training, promoting collaborations, and laying the groundwork for future developments in the field.

 


Last Modified: 12/23/2023
Modified by: Jun Li

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page