NSF Award Search: Award # 1319745

Award Abstract # 1319745

AF: Small: Algorithms for Inference

NSF Org:	CCF Division of Computing and Communication Foundations
Recipient:	CALIFORNIA INSTITUTE OF TECHNOLOGY
Initial Amendment Date:	June 27, 2013
Latest Amendment Date:	June 27, 2013
Award Number:	1319745
Award Instrument:	Standard Grant
Program Manager:	Tracy Kimbrel CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering
Start Date:	September 1, 2013
End Date:	February 28, 2017 (Estimated)
Total Intended Award Amount:	$473,942.00
Total Awarded Amount to Date:	$473,942.00
Funds Obligated to Date:	FY 2013 = $473,942.00
History of Investigator:	Leonard Schulman (Principal Investigator) schulman@caltech.edu
Recipient Sponsored Research Office:	California Institute of Technology 1200 E CALIFORNIA BLVD PASADENA CA US 91125-0001 (626)395-6219
Sponsor Congressional District:	28
Primary Place of Performance:	California Institute of Technology CA US 91125-0001
Primary Place of Performance Congressional District:	28
Unique Entity Identifier (UEI):	U2JMKHNS5TG4
Parent UEI:
NSF Program(s):	ALGORITHMS
Primary Program Source:	01001314DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7923, 7926
Program Element Code(s):	792600
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

The first focus of this award is the problem of inferring causal relationships. This problem is essential to many statistical applications. Generally speaking, causation can be inferred only by active intervention, through controlled experiment. However such experiments may be impossible or else practically or morally infeasible: for instance, in predicting potential effects regulations or laws on medical, educational and economic outcomes, or, in predicting whole-ecosystem effects of human activity. The starting point for Dr. Schulman's research in this area is Pearl's theory of Structured Causal Models (SCM) which, allows, in special circumstances, "identification" of a causal relationship (as opposed to a statistical correlation) from passive observation. The proposed research aims, in the first place, to expand the above special class of circumstances---and thereby make the theory more widely applicable---by using a relaxed but still useful notion of "weak identification" of causal relationships. The relaxed notion is more robust, and enables valid inference even if the posited SCM is slightly inaccurate. In the second place, the research aims to provide efficient and numerically stable algorithms for weak identification from empirical data.

The second focus of this award, again in computational statistics, is the representation of a large data set (considered as an empirical measure) by a much smaller data set, in such a way that for a specific family of integrals, all integrals of the measure are approximately preserved. This work encompasses two separate application areas. The first concerns clustering and related high dimensional data analysis problems. Here the compressed data set is known as an epsilon-approximation or core-set of the input measure. A particular focus is on "underclustering", namely, preparation of core-sets for clustering in a normed space, before the norm has been specified. The technical tools needed in this application have to do with recently developed ideas about the "total sensitivity" of the family of integrals, as well as with, on the algorithmic side, bicriteria approximations. The second application area concerns signal processing (or approximation theory) on compact groups. Here the methods draw on representation theory, the classical theory of the moment problem, and convex geometry.

This award will be used to train graduate students and postdoctoral fellows in research in algorithms, statistics, and underlying mathematical topics in algebra and geometry.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 19)

Show All

Alistair Sinclair, Piyush Srivastava, Daniel Stefankovic and Yitong Yin "Spatial mixing and the connective constant: optimal bounds" Probability Theory and Related Fields , 2016 10.1007/s00440-016-0708-2

Aram W. Harrow, Alexandra Kolla and Leonard J. Schulman "Dimension-free L2 maximal inequality for spherical means in the hypercube" Theory of Computing , v.10 , 2014 , p.55

A. W. Harrow, A. Kolla and L. J. Schulman "Dimension-free L2 maximal inequality for spherical means in the hypercube" Theory of Computing , v.10 , 2014 , p.55

Cris Moore and Leonard J Schulman "Tree Codes and a Conjecture on Exponential Sums" ITCS , 2014

Gil Cohen and Leonard J Schulman "Extractors for Near Logarithmic Min-Entropy" FOCS , 2016

Ioannis Panageas, Piyush Srivastava, and Nisheeth K. Vishnoi "Evolutionary dynamics in finite populations mix rapidly" SODA , 2016

Jian Li, Yuval Rabani, Leonard J. Schulman and Chaitanya Swamy "Learning Arbitrary Statistical Mixtures of Discrete Distributions" STOC , 2015

Leonard J. Schulman, Alistair Sinclair and Piyush Srivastava "Symbolic integration and the complexity of computing averages" FOCS , 2015

Leonard J Schulman and Vijay V Vazirani "Allocation of Divisible Goods under Lexicographic Preferences" FSTTCS , 2015

Matthew Franklin, Ran Gelles, Rafail Ostrovsky and Leonard J. Schulman "Optimal coding for streaming authentication and interactive communication" IEEE Trans. Information Theory , v.61 , 2015 , p.133

(Showing: 1 - 10 of 19)

Show All

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

We summarize here key outcomes of the Award in two topics. The Final Report on the Award details a wide variety of other outcomes of the work.

The first topic is causal inference in Structured Causal Models (SCM). Causal inference is essential to many statistical applications. Generally speaking, causation be inferred only by active intervention, i.e., through a controlled experiment. However such experiments may be impossible or else practically or morally infeasible: for instance, in predicting potential effects of regulations or laws on medical, educational and economic outcomes, or, in predicting whole-ecosystem effects of human activity.

The starting point for this work was Pearl's remarkable theory of SCM's which allows, in special circumstances, an algorithm "ID" which identifies, purely from passive observation of, the causal relationship (as opposed to a statistical correlation) between an "intervention" variable and an "effect" variable. The existing state of the theory, however, left many significant challenges. One of those concerned the numerical stability, model stability and statistical stability of ID.

A significant outcome of the research was our discovery that, when applied to certain SCM's involving N variables, ID can be extremely unstable. Small deviations in the empirical statistics, or small errors in the model specification, are amplified in ID by a factor that is exponential in the square root of N. To all intents and purposes, causal inference is actually not possible in such a model, despite earlier theoretical results.

On the other hand we also discovered positive news. There is a more restricted, but still rich, class of SCMs in which we can show that the instability amplification is proportional only to N. For these models our result shows that application of previous methods can be statistically justified.

We believe that both these results will have impact on the growth and success of the field of causal inference and its applications to many questions in science and public policy.

The second topic is that of analysis of numerical algorithms. A key outcome of the Award concerns the analysis of Osborne's half-century-old iterative method for matrix pre-conditioning. This method is widely implemented in the major numerical linear algebra packages, and in standard practice is applied in the first step of a large eigenvalue computation (one of the most common kinds of computations in scientific computing). Despite abundant practical experience with this simple method, runtime analysis eluded all prior work, and the method was applied purely heuristically. We were able for the first time to provide, for one version of the method, a worst-case runtime guarantee. Our bound essentially matches the known lower bound. Our bound was originally obtained only for a special class of matrices, but in subsequent work we extended it to all matrices.

This work has already led to follow-up papers by other authors, analyzing other variants of the iterative method; the work also has the potential to influence numerical linear algebra packages, because the analysis points to some possible advantage of a new variant of the method.

Several junior researchers have been mentored by the PI as part of this work (two postdocs, two graduate students and two undergraduates).

Last Modified: 05/16/2017
Modified by: Leonard J Schulman

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error