Award Abstract # 1622449
Collaborative Research: CDS&E: Applied Algebraic Statistics through R

NSF Org: DMS
Division Of Mathematical Sciences
Recipient: BAYLOR UNIVERSITY
Initial Amendment Date: September 8, 2016
Latest Amendment Date: September 8, 2016
Award Number: 1622449
Award Instrument: Standard Grant
Program Manager: Christopher Stark
DMS
 Division Of Mathematical Sciences
MPS
 Directorate for Mathematical and Physical Sciences
Start Date: September 1, 2016
End Date: August 31, 2019 (Estimated)
Total Intended Award Amount: $63,897.00
Total Awarded Amount to Date: $63,897.00
Funds Obligated to Date: FY 2016 = $63,897.00
History of Investigator:
  • David Kahle (Principal Investigator)
    David_Kahle@baylor.edu
Recipient Sponsored Research Office: Baylor University
700 S UNIVERSITY PARKS DR
WACO
TX  US  76706-1003
(254)710-3817
Sponsor Congressional District: 17
Primary Place of Performance: Baylor University
One Bear Place
Waco
TX  US  76798-7360
Primary Place of Performance
Congressional District:
17
Unique Entity Identifier (UEI): C6T9BYG5EYX5
Parent UEI:
NSF Program(s): CDS&E-MSS,
CDS&E
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433, 8084, 9263
Program Element Code(s): 806900, 808400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

The interface of applied algebraic geometry and statistics known as algebraic statistics abounds with fresh insight into old and new problems in practical data analysis. The fundamental connection stems from the realization that many statistical models are or can be identified with geometric structures amenable to algebraic investigation, enabling statisticians to draw from the great wealth of algebraic tools when solving statistical problems. Since this recognition, algebraic tools have found applications all over statistics, especially in contexts involving cross-classified data. Despite these advances, the use of algebraic methods in traditionally statistical areas of data analysis is still not mainstream, mostly because the methods involve kinds of mathematical computations previously unnecessary for data analyses and, consequently, not available in standard software. This work confronts this problem head-on by 1) fortifying connections between a free statistical computing environment popular among data analysts (R) and various software in the mathematics community through add-on packages created by the PIs and 2) implementing user-friendly interfaces to cutting-edge algebraic statistical methods enabled by the external software.

The R package algstat and supporting packages will be further developed, strengthening connections to software used in algebraic statistics and providing functions and data structures for algebraic statistical methods that leverage those software. In year one of the project, the PIs and their teams will work on LattE and 4ti2, and Markov bases techniques for exact inference in loglinear, logistic, and Poisson regression models will be created and improved. In year two, the PIs and their teams will work on Bertini. Functions and data structures related to the numerical solution of systems of polynomial equations will be improved and expanded, and applications to phylogenetics will be considered. In year three, the PIs and their teams will work on Macaulay2, fortifying its connection to R and using it to enhance the mpoly package and adaptively inform the MCMC routines for exact inference in exponential family models enabled by the LattE and 4ti2 connections.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

D. Kahle, R. Yoshida, and L. Garcia-Puente "Hybrid Schemes for Exact Conditional Inference in Discrete Exponential Families" Annals of the Institute of Statistical Mathematics , 2018
Kahle, D., R. Yoshida, and L. Garcia-Puente "Hybrid Schemes for Exact Conditional Inference in Discrete Exponential Families" Annals of the Institute of Statistical Mathematics , v.70 , 2018 , p.983

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The field of algebraic statistics sits at the intersection of statistics and applied algebraic geometry and related fields. Only a few decades old, algebraic statistics is an interdisciplinary endeavor taking place within the mathematical sciences themselves. At a basic level, algebraic statistics attempts to use results from algebraic geometry and related fields (e.g. combinatorics and polyhedral geometry) to inform problems in statistical science, at both theoretical and applied levels.  While young, algebraic statistics has already made substantial contributions to the ?core? of statistics by solving problems previously inaccessible, such as exact conditional inference in discrete exponential family models and the (algebraic) complexity of maximum likelihood estimators. However, algebraic statistics has much more to give, both in theory and applications.

 

In practice, a major challenge to the wide-spread use of algebraic statistical methods is a lack of software enabling and facilitating its computations. The fundamental contribution of this project has been to address this problem by creating software that bridges the gap between software used in the statistics and data science communities, R, and software used in the applied algebraic geometry and related mathematics communities, including LattE, 4ti2, Bertini, and Macaulay2. This software now exists in the form of the R packages latte, bertini, m2r, algstat, and mpoly, distributed on GitHub and (to varying extents) CRAN, in a foundational form that can be built upon in the future. It also can be used immediately for a broad array of applications. In addition to the software, the project also contributes novel insight as to how algebraic and geometric methods can be used in statistical science, for example through its investigations into algebraic pattern recognition.

 

This work therefore serves many communities: to the statistics and broader data science communities, the project provides both basic algorithms and cutting edge statistical methods based on those algorithms; to the various mathematics communities, it provides an applied outlet for otherwise theoretical work and a major platform on which to demonstrate and disseminate those applications. More, it serves to strengthen the bond between the two communities by providing tools to both and fostering further collaboration. It has also prioritized the education of young scholars: four undergraduate students and four graduate students, two of which have since obtained their Ph.D.?s, were trained in algebraic statistics and statistical software development. Three of the four undergraduates are now pursuing advanced degrees in statistics or computer science.



Last Modified: 12/30/2019
Modified by: David Kahle

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page