Award Abstract # 2015379
Collaborative Research: Extremes in High Dimensions: Causality, Sparsity, Classification, Clustering, Learning

NSF Org: DMS
Division Of Mathematical Sciences
Recipient: THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK
Initial Amendment Date: June 18, 2020
Latest Amendment Date: June 18, 2020
Award Number: 2015379
Award Instrument: Standard Grant
Program Manager: Yong Zeng
yzeng@nsf.gov
 (703)292-7299
DMS
 Division Of Mathematical Sciences
MPS
 Directorate for Mathematical and Physical Sciences
Start Date: July 1, 2020
End Date: June 30, 2023 (Estimated)
Total Intended Award Amount: $299,999.00
Total Awarded Amount to Date: $299,999.00
Funds Obligated to Date: FY 2020 = $299,999.00
History of Investigator:
  • Richard Davis (Principal Investigator)
    rdavis@stat.columbia.edu
  • Marco Avella Medina (Co-Principal Investigator)
Recipient Sponsored Research Office: Columbia University
615 W 131ST ST
NEW YORK
NY  US  10027-7922
(212)854-6851
Sponsor Congressional District: 13
Primary Place of Performance: Columbia University
1255 Amsterdam Ave
New York
NY  US  10027-6902
Primary Place of Performance
Congressional District:
13
Unique Entity Identifier (UEI): F4N1QNPB95M4
Parent UEI:
NSF Program(s): STATISTICS
Primary Program Source: 01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 126900
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

In recent years, through news reports and first-hand experience, the general public has become keenly aware of extreme events, in particular, of extreme weather conditions such as extended heat waves, periods of extreme cold, an increase in the number and intensity of tornadoes and hurricanes, or periods of record precipitation resulting in unprecedented floods. Just in the past few years, the insurance claims from extreme climatic events have been staggering, which include the Missouri River flood in April 2019 ($10.8B), Hurricane Michael in October 2018 ($25B), the California wildfires in December 2017 ($18.7B), the US drought/heatwave in 2012 ($33.9B), and Hurricane Sandy in October 2012 ($73.4B). This list does not include non-climatic extreme events such as the financial crisis from 2008 nor the current covid-19 pandemic. Many of the extreme events experienced today that are weather, environmental, industrial, epidemiological, economic, or social media related are occurring at a more frequent rate, which often result in huge losses to our society in a variety of ways from financial to human life to our way of life. While the occurrence of extreme events is reasonably well understood in steady state situations, it has become clear that the preponderance of extremes events suggest that the steady-state assumption is no longer valid. The key objective of this research is to try to understand causal impacts of various factors from a potentially large array of variables including changing environmental conditions, demographic movements within the US, changing landscapes, and changing economic conditions, on the frequency and magnitude of extreme events. From many variables, we hope to produce methodology to extract the important features in the data that have a direct impact on describing and predicting extremes. This research is potentially of use to policymakers who need to anticipate and plan for extreme events leading to sensible strategies for mitigating their impact on society. The graduate student support will be used for interdisciplinary research.

The principal goal of this research project is to design new tools for analyzing and modeling extremes in a myriad of situations that go well beyond the boundaries of classical extreme value theory. These include detection of often nonlinear sets of much smaller dimension that can provide an adequate description of extremes in high dimensions, for which we hope to apply the powerful modern learning techniques (such as graph-based learning methods) that allow us to determine this extremal support from the data. In general, detecting sparsity in the exponent measure describing high-dimensional extremes, i.e., locating (often numerous) low-dimensional regions which carry most of the support of exponent measure will be a key focus of this research. A second main thrust of this research centers on the issue of causality in both small and large dimensional problems. In the most basic form, a set of variables X is said to be tail causal to a dependent vector Y if certain changes in X (sometimes themselves extreme but not always so) impact the tail behavior of Y. An important setting of this type is the potential outcomes framework for causality of extreme events, which will be a major focus in this project's research agenda.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Cohen, Joel E. and Davis, Richard A. and Samorodnitsky, Gennady "COVID-19 cases and deaths in the United States follow Taylors law for heavy-tailed distributions with infinite variance" Proceedings of the National Academy of Sciences , v.119 , 2022 https://doi.org/10.1073/pnas.2209234119 Citation Details
Cohen, Joel E. and Davis, Richard A. and Samorodnitsky, Gennady "Heavy-tailed distributions, correlations, kurtosis and Taylors Law of fluctuation scaling" Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences , v.476 , 2020 https://doi.org/10.1098/rspa.2020.0610 Citation Details
Davis, Richard A. and do Rêgo Sousa, Thiago and Klüppelberg, Claudia "Indirect inference for time series using the empirical characteristic function and control variates" Journal of Time Series Analysis , v.42 , 2021 https://doi.org/10.1111/jtsa.12582 Citation Details
Davis, Richard A. and Fernandes, Leon and Fokianos, Konstantinos "Clustering multivariate time series using energy distance" Journal of Time Series Analysis , 2023 https://doi.org/10.1111/jtsa.12688 Citation Details
Davis, Richard A. and Nielsen, Mikkel S. "Modeling of time series using random forests: Theoretical developments" Electronic Journal of Statistics , v.14 , 2020 https://doi.org/10.1214/20-EJS1758 Citation Details
Davis, Richard and Ng, Serena "Time series estimation of the dynamic effects of disaster-type shocks" Journal of Econometrics , 2022 https://doi.org/10.1016/j.jeconom.2022.02.009 Citation Details
Xu, Hui and Cohen, Joel E. and Davis, Richard A. and Samorodnitsky, Gennady "Cauchy, normal and correlations versus heavy tails" Statistics & Probability Letters , v.186 , 2022 https://doi.org/10.1016/j.spl.2022.109489 Citation Details
Xu, Hui and Davis, Richard and Samorodnitsky, Gennady "Handling missing extremes in tail estimation" Extremes , 2021 https://doi.org/10.1007/s10687-021-00429-z Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

One of the overarching goals of this research project was to develop new tools for analyzing and modeling extremes in a myriad of situations that go well beyond the boundaries of classical extreme value theory.  For example, if the data is of large dimension such as temperature or rainfall recorded at a large number of monitoring stations, one might want to associate the impact of extreme temperature or rainfall at one location on the other locations.  In other words extremes from the entire set of data can often be reduced to a much smaller and manageable subset of the data without much loss in information.   To this end, the research produced in this project related to two specific methodologies: one was to discover clustering of extreme observations and the second was to find condensed regions in the data that can provide adequate description of the extremes. 

 

For finding clusters, we used an approach from machine learning based on spectral clustering.  The idea is to consider the angular components of each data value when that particular data value is large.  These angular components tend to cluster in certain directions.  For each two angular components we define a distance from which a graphical model is formed.  That is, two angular components are linked by an edge in the graph if the distance between them is sufficiently small.  After forming this graph, then a graphical Laplacian is defined which allows one to optimally locate clusters of highly concentrated angular components.  One big advantage of this method is that it was able to effectively separate signal from noise.

 

We used kernel principal component analysis (PCA) to find nonlinear regions of high concentration of extremes.  Here the main idea is that the data are  mapped into a high-dimensional space, called a reproducing kernel Hilbert space (RKHS), via a kernel function map.  On this larger space, one performs PCA in the typical manner; i.e., the number of significant eigenvalues and corresponding eigenfunctions are identified.  At this stage the dimension of the problem has been substantially reduced.  The pre-images under this mapping are then calculated under the under the smaller subspace found on the RKHS.  It was discovered that the procedure was particularly effective for the case when the data is contaminated with noise. 

On the broad impacts side, Professor Davis has advised one PhD student, Leon Fernandes, during the duration of this grant.  He has been working on problems that are tangentially related to this grant and expects to defend in December 2023.  Davis taught a topics in probability course to PhD students in the Statistics Department at Columbia in spring 2022.  A main component of this course included topics in heavy-tailed time series modeling and extremal dependence, which are directly related to this grant.  Davis also delivered a PhD summer course (May 2023) for students at Bocconi University which included topics in extreme value theory.  He has also been a member of Columbia’s STEM DEI committee for the past 3 years.

 

 


Last Modified: 07/30/2023
Modified by: Richard A Davis

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page