
NSF Org: |
DMS Division Of Mathematical Sciences |
Recipient: |
|
Initial Amendment Date: | September 8, 2016 |
Latest Amendment Date: | September 8, 2016 |
Award Number: | 1622449 |
Award Instrument: | Standard Grant |
Program Manager: |
Christopher Stark
DMS Division Of Mathematical Sciences MPS Directorate for Mathematical and Physical Sciences |
Start Date: | September 1, 2016 |
End Date: | August 31, 2019 (Estimated) |
Total Intended Award Amount: | $63,897.00 |
Total Awarded Amount to Date: | $63,897.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
700 S UNIVERSITY PARKS DR WACO TX US 76706-1003 (254)710-3817 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
One Bear Place Waco TX US 76798-7360 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
CDS&E-MSS, CDS&E |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.049 |
ABSTRACT
The interface of applied algebraic geometry and statistics known as algebraic statistics abounds with fresh insight into old and new problems in practical data analysis. The fundamental connection stems from the realization that many statistical models are or can be identified with geometric structures amenable to algebraic investigation, enabling statisticians to draw from the great wealth of algebraic tools when solving statistical problems. Since this recognition, algebraic tools have found applications all over statistics, especially in contexts involving cross-classified data. Despite these advances, the use of algebraic methods in traditionally statistical areas of data analysis is still not mainstream, mostly because the methods involve kinds of mathematical computations previously unnecessary for data analyses and, consequently, not available in standard software. This work confronts this problem head-on by 1) fortifying connections between a free statistical computing environment popular among data analysts (R) and various software in the mathematics community through add-on packages created by the PIs and 2) implementing user-friendly interfaces to cutting-edge algebraic statistical methods enabled by the external software.
The R package algstat and supporting packages will be further developed, strengthening connections to software used in algebraic statistics and providing functions and data structures for algebraic statistical methods that leverage those software. In year one of the project, the PIs and their teams will work on LattE and 4ti2, and Markov bases techniques for exact inference in loglinear, logistic, and Poisson regression models will be created and improved. In year two, the PIs and their teams will work on Bertini. Functions and data structures related to the numerical solution of systems of polynomial equations will be improved and expanded, and applications to phylogenetics will be considered. In year three, the PIs and their teams will work on Macaulay2, fortifying its connection to R and using it to enhance the mpoly package and adaptively inform the MCMC routines for exact inference in exponential family models enabled by the LattE and 4ti2 connections.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The field of algebraic statistics sits at the intersection of statistics and applied algebraic geometry and related fields. Only a few decades old, algebraic statistics is an interdisciplinary endeavor taking place within the mathematical sciences themselves. At a basic level, algebraic statistics attempts to use results from algebraic geometry and related fields (e.g. combinatorics and polyhedral geometry) to inform problems in statistical science, at both theoretical and applied levels. While young, algebraic statistics has already made substantial contributions to the ?core? of statistics by solving problems previously inaccessible, such as exact conditional inference in discrete exponential family models and the (algebraic) complexity of maximum likelihood estimators. However, algebraic statistics has much more to give, both in theory and applications.
In practice, a major challenge to the wide-spread use of algebraic statistical methods is a lack of software enabling and facilitating its computations. The fundamental contribution of this project has been to address this problem by creating software that bridges the gap between software used in the statistics and data science communities, R, and software used in the applied algebraic geometry and related mathematics communities, including LattE, 4ti2, Bertini, and Macaulay2. This software now exists in the form of the R packages latte, bertini, m2r, algstat, and mpoly, distributed on GitHub and (to varying extents) CRAN, in a foundational form that can be built upon in the future. It also can be used immediately for a broad array of applications. In addition to the software, the project also contributes novel insight as to how algebraic and geometric methods can be used in statistical science, for example through its investigations into algebraic pattern recognition.
This work therefore serves many communities: to the statistics and broader data science communities, the project provides both basic algorithms and cutting edge statistical methods based on those algorithms; to the various mathematics communities, it provides an applied outlet for otherwise theoretical work and a major platform on which to demonstrate and disseminate those applications. More, it serves to strengthen the bond between the two communities by providing tools to both and fostering further collaboration. It has also prioritized the education of young scholars: four undergraduate students and four graduate students, two of which have since obtained their Ph.D.?s, were trained in algebraic statistics and statistical software development. Three of the four undergraduates are now pursuing advanced degrees in statistics or computer science.
Last Modified: 12/30/2019
Modified by: David Kahle
Please report errors in award information by writing to: awardsearch@nsf.gov.