
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | June 27, 2013 |
Latest Amendment Date: | June 27, 2013 |
Award Number: | 1319745 |
Award Instrument: | Standard Grant |
Program Manager: |
Tracy Kimbrel
CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2013 |
End Date: | February 28, 2017 (Estimated) |
Total Intended Award Amount: | $473,942.00 |
Total Awarded Amount to Date: | $473,942.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1200 E CALIFORNIA BLVD PASADENA CA US 91125-0001 (626)395-6219 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
CA US 91125-0001 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | ALGORITHMS |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The first focus of this award is the problem of inferring causal relationships. This problem is essential to many statistical applications. Generally speaking, causation can be inferred only by active intervention, through controlled experiment. However such experiments may be impossible or else practically or morally infeasible: for instance, in predicting potential effects regulations or laws on medical, educational and economic outcomes, or, in predicting whole-ecosystem effects of human activity. The starting point for Dr. Schulman's research in this area is Pearl's theory of Structured Causal Models (SCM) which, allows, in special circumstances, "identification" of a causal relationship (as opposed to a statistical correlation) from passive observation. The proposed research aims, in the first place, to expand the above special class of circumstances---and thereby make the theory more widely applicable---by using a relaxed but still useful notion of "weak identification" of causal relationships. The relaxed notion is more robust, and enables valid inference even if the posited SCM is slightly inaccurate. In the second place, the research aims to provide efficient and numerically stable algorithms for weak identification from empirical data.
The second focus of this award, again in computational statistics, is the representation of a large data set (considered as an empirical measure) by a much smaller data set, in such a way that for a specific family of integrals, all integrals of the measure are approximately preserved. This work encompasses two separate application areas. The first concerns clustering and related high dimensional data analysis problems. Here the compressed data set is known as an epsilon-approximation or core-set of the input measure. A particular focus is on "underclustering", namely, preparation of core-sets for clustering in a normed space, before the norm has been specified. The technical tools needed in this application have to do with recently developed ideas about the "total sensitivity" of the family of integrals, as well as with, on the algorithmic side, bicriteria approximations. The second application area concerns signal processing (or approximation theory) on compact groups. Here the methods draw on representation theory, the classical theory of the moment problem, and convex geometry.
This award will be used to train graduate students and postdoctoral fellows in research in algorithms, statistics, and underlying mathematical topics in algebra and geometry.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
We summarize here key outcomes of the Award in two topics. The Final Report on the Award details a wide variety of other outcomes of the work.
The first topic is causal inference in Structured Causal Models (SCM). Causal inference is essential to many statistical applications. Generally speaking, causation be inferred only by active intervention, i.e., through a controlled experiment. However such experiments may be impossible or else practically or morally infeasible: for instance, in predicting potential effects of regulations or laws on medical, educational and economic outcomes, or, in predicting whole-ecosystem effects of human activity.
The starting point for this work was Pearl's remarkable theory of SCM's which allows, in special circumstances, an algorithm "ID" which identifies, purely from passive observation of, the causal relationship (as opposed to a statistical correlation) between an "intervention" variable and an "effect" variable. The existing state of the theory, however, left many significant challenges. One of those concerned the numerical stability, model stability and statistical stability of ID.
A significant outcome of the research was our discovery that, when applied to certain SCM's involving N variables, ID can be extremely unstable. Small deviations in the empirical statistics, or small errors in the model specification, are amplified in ID by a factor that is exponential in the square root of N. To all intents and purposes, causal inference is actually not possible in such a model, despite earlier theoretical results.
On the other hand we also discovered positive news. There is a more restricted, but still rich, class of SCMs in which we can show that the instability amplification is proportional only to N. For these models our result shows that application of previous methods can be statistically justified.
We believe that both these results will have impact on the growth and success of the field of causal inference and its applications to many questions in science and public policy.
The second topic is that of analysis of numerical algorithms. A key outcome of the Award concerns the analysis of Osborne's half-century-old iterative method for matrix pre-conditioning. This method is widely implemented in the major numerical linear algebra packages, and in standard practice is applied in the first step of a large eigenvalue computation (one of the most common kinds of computations in scientific computing). Despite abundant practical experience with this simple method, runtime analysis eluded all prior work, and the method was applied purely heuristically. We were able for the first time to provide, for one version of the method, a worst-case runtime guarantee. Our bound essentially matches the known lower bound. Our bound was originally obtained only for a special class of matrices, but in subsequent work we extended it to all matrices.
This work has already led to follow-up papers by other authors, analyzing other variants of the iterative method; the work also has the potential to influence numerical linear algebra packages, because the analysis points to some possible advantage of a new variant of the method.
Several junior researchers have been mentored by the PI as part of this work (two postdocs, two graduate students and two undergraduates).
Last Modified: 05/16/2017
Modified by: Leonard J Schulman
Please report errors in award information by writing to: awardsearch@nsf.gov.