
NSF Org: |
DMS Division Of Mathematical Sciences |
Recipient: |
|
Initial Amendment Date: | May 11, 2018 |
Latest Amendment Date: | May 11, 2018 |
Award Number: | 1811779 |
Award Instrument: | Standard Grant |
Program Manager: |
Pena Edsel
DMS Division Of Mathematical Sciences MPS Directorate for Mathematical and Physical Sciences |
Start Date: | July 1, 2018 |
End Date: | June 30, 2022 (Estimated) |
Total Intended Award Amount: | $250,000.00 |
Total Awarded Amount to Date: | $250,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
6100 MAIN ST Houston TX US 77005-1827 (713)348-4820 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
6100 Main St Houston TX US 77005-1827 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | STATISTICS |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.049 |
ABSTRACT
This project will further the development of exact and asymptotic distribution theory for Gaussian processes and their quadratic forms. While modern advances in data science owe much progress to computational methods and the rapid growth in computer technology, statistics and applied probability are rife with examples where a careful mathematical analysis allows discoveries that no amount of computational power can uncover. This project is one such example and will use the PI's work on Yule's so called "nonsense" correlation, a 90-year old open problem that was solved last year via mathematical analysis tools. This explicit calculation showed the precise scale of the apparent correlation between two independent continuous series of data, such as what one encounters in economics, climate science, finance, and many other fields. This mathematical explanation of an apparent statistical paradox will enable the investigation of other important questions in mathematical statistics. The project will investigate a possible connection between some important open questions and a set of tools in probability theory whose power mathematical statisticians have only begun to investigate. The project will provide fertile ground for statistics graduate student training at Rice and Michigan State Universities; students will benefit from a wide scope of opportunities, from rigorous study of mathematical tools, to their use in statistics, to applications in fields of great societal value.
This project will investigate the probability law of the Pearson correlation between two independent or dependent Gaussian processes. Analyses of distributions in the second Wiener chaos (quadratic forms of normals) are a new set of tools that will be brought to bear. Those tools are flexible enough to handle any Gaussian process via their so-called Karhunen-Loeve expansions. In terms of applications, what is most striking is that any statistical estimation or test based on these projected studies would only require a single or a pair of observations; this is particularly useful for situations, such as in environmental statistics or in economics, where experiments cannot be designed, and one has to work with the available observable data collected dynamically in time. The second emphasis in this study, on Polya frequency functions and related densities, uses some of the same mathematical tools, thanks to a realization that the densities can be represented and expanded explicitly in the second Wiener chaos. The project seeks to prove when a density is strongly log-concave (e.g. its logarithm has a second derivative which is bounded away from zero.) This question, which in mathematical statistics is phrased more broadly in terms of Polya frequency functions, has distribution of sums of independent and non-identically distributed exponentials, expands to the case of general second-chaos distributions. The project could have important consequences in the practice of statistics, especially in areas where comparing non-trivial time series is a challenge, and in many scientific fields informed by properties of log-concavity and strong log-concavity.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project has built the mathematical foundations for designing the first demonstrably correct statistical tests for testing independence for pairs of paths of Gaussian processes: Wiener processes, Ornstein-Uhlenbeck (OU) processes, fractional Ornstein-Uhlenbeck (fOU), and fractional Brownian motion (fBm). The importance of constructing such tests is motivated by our 2017 paper on Yule's so called "nonsense" correlation, in which we provide the precise scale of the apparent correlation between two independent continuous series of data, such as what one encounters in economics, climate science, finance, and many other fields.
We embarked on this proposal by developing theory and methodology for calculating all moments (up to order 16) of Yule’s “nonsense” correlation for two independent Wiener processes. This allows us to provide the first density approximation to Yule's “nonsense” correlation. The methodology we develop is broad in spirit, allowing us to work conclusively in all settings where the Gaussian process arises as the solution of a linear stochastic differential equation (SDE). We then employ these methods to explicitly calculate the moments of the empirical correlation for two correlated Brownian motions (with correlation coefficient), two independent Ornstein-Uhlenbeck processes, and two independent Brownian bridges. Establishing unequivocal mathematical facts which are simple to explain, such as providing calculations of second moments of empirical correlation for the processes considered above, should bear enormous potential in popularizing the risks associated with widespread misinterpretation of Pearson correlation coefficients.
We then explore the following question: what is the distribution of the empirical correlation for two independent Gaussian random walks? This question is of interest not least because discrete stochastic process data (for example, time series data) occur most frequently and extensively in the real world. A test statistic for discrete processes is thus easier for practitioners to apply than that for continuous stochastic processes. Studying the discrete data test statistic directly is also a means of minimizing the risk of using the continuous statistic abusively when the discrete-data situation is not sufficiently well approximated by a continuous-data one. In this vein, we succeed in providing an exact formula for the second moment of the empirical correlation of two independent Gaussian random walks (as well as implicit formulas for higher moments). We also provide rates of convergence of the empirical correlation of two independent Gaussian random walks to the empirical correlation of two independent Wiener processes, and explicit upper bounds (in terms of the Wasserstein distance) are given.
We proceed to work in statistical inference for discrete-time second chaos processes, as well as for Gaussian (discrete-time) processes. We compute the quadratic variations of all AR(1) stationary time series in the second chaos, and estimate their normal speeds of convergence in total variation. In addition to working with discrete-time second chaos processes, we consider basic objects in the fourth and second Wiener chaos which are directly relevant to their statistical properties, and moreover, these objects are statistics of the processes' entire paths. Understanding how these objects behave asymptotically will be a necessary first step in achieving quantitative estimates for asymptotic normality of the Pearson correlation for stationary processes like the Ornstein-Uhlenbeck process, as well as determining when the graininess of the time scale affects the correlation's distribution, and when it does not. In fact, the second-chaos AR(1) processes will all converge, under proper scaling, to the Gaussian Ornstein-Uhlenbeck process, but we are interested in mesoscopic scales where the fluctuations' normality is too distant to be relied upon.
The relevant notions of tests of independence of pairs of paths of stochastic processes we have considered, and which we will continue to consider, are manifold: from short-range to long-range correlation for individual paths, to whether single pairs or finite sets of paths are statistically related, and to applied consequences when considering questions of attribution of factors for real-world phenomena, particularly relating to weather and climate. Important applied aims include an investigation of climate-related risks, such as sea-level rise and extreme weather events, particularly in the North Atlantic Ocean, how they correlate dynamically over medium and long terms, and how heavy-tailed they are. We would be remiss if we did not highlight the recent misuse of Pearson correlation in the area of late-Holocene paleoclimatology, an area whose importance for projecting our planet's climate in the next 200 or 1000 years cannot be overstated.
Finally, our proposal has also significantly contributed to both graduate and undergraduate training. The support of the NSF has enabled us to train four Ph.D. students, all of which are now active contributors to the STEM research community. We have also engaged with undergraduate and masters students on aspects of this proposal relevant to climate science, to agricultural economics, and to development economics in least-developed countries.
Last Modified: 12/14/2022
Modified by: Philip Ernst
Please report errors in award information by writing to: awardsearch@nsf.gov.