NSF Award Search: Award # 2132247

Award Abstract # 2132247

A U-statistic approach to population genetics

NSF Org:	DBI Division of Biological Infrastructure
Recipient:	AMERICAN UNIVERSITY
Initial Amendment Date:	July 16, 2021
Latest Amendment Date:	July 16, 2021
Award Number:	2132247
Award Instrument:	Standard Grant
Program Manager:	Jennifer Weller jweller@nsf.gov (703)292-2224 DBI Division of Biological Infrastructure BIO Directorate for Biological Sciences
Start Date:	September 1, 2021
End Date:	December 31, 2023 (Estimated)
Total Intended Award Amount:	$188,694.00
Total Awarded Amount to Date:	$188,694.00
Funds Obligated to Date:	FY 2021 = $188,694.00
History of Investigator:	David Gerard (Principal Investigator) dgerard@american.edu
Recipient Sponsored Research Office:	American University 4400 MASSACHUSETTS AVE NW WASHINGTON DC US 20016-8003 (202)885-3440
Sponsor Congressional District:	00
Primary Place of Performance:	American University 4400 Massachusetts Avenue, NW Washington DC US 20016-8003
Primary Place of Performance Congressional District:	00
Unique Entity Identifier (UEI):	H4VNDUN2VWU5
Parent UEI:
NSF Program(s):	Innovation: Bioinformatics
Primary Program Source:	01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	1165
Program Element Code(s):	164Y00
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.074

ABSTRACT

Polyploids, organisms with more than two complete sets of chromosomes, are ubiquitous in the plant kingdom, predominant in agriculture, and important drivers of evolution. Many plant species exhibit ancestral polyploidy, and so understanding the evolutionary behavior of polyploids today gives researchers a greater understanding of the mechanisms of evolution in general. However, polyploids manifest greater complexities that make modeling their genomes much more difficult. This project will address these added statistical and computational complexities by developing novel methods for the population genetics of polyploid genomes. These methods will allow researchers to better determine the structural relationships within and between polyploid populations, which could better reveal aspects of the underlying evolutionary processes within these species. All methods will be implemented in open-source software, which will make these approaches accessible to applied researchers. This project will provide advanced training in Statistics and Computational Biology to undergraduate and graduate researchers in preparation for the next steps in their careers. This project will also result in publicly available educational materials for advanced statistical
computation using the R statistical language, making such topics more accessible to the greater academic community.

This project reformulates key tasks from population genetics in terms of U-statistic minimization, a statistical technique for estimation and testing. This approach will lend itself to greater generality and complexity, such as for polyploid populations. These methods will also account for deviations from classical Mendelian segregation caused, for example, by double reduction, the co-migration of sister chromatids into the same gamete during meiosis, a common event in some types of polyploids. The first aim is to develop novel testing strategies for equilibrium in polyploid and mixed-ploidy populations. The second aim is to develop new approaches to estimate population structure while accounting for common issues that result from polyploid data. The third aim is to explore other possible applications of the U-statistic approach, such as for inbreeding estimation or linkage disequilibrium estimation. This project emphasizes developing usable software for the research community, and extreme reproducibility in all results. This project will deliver usable R packages for each innovation, which will be accessible to the greater biological community. Student researchers will be trained in the fundamentals of software development using R, and so will be an integral
part in building the R packages of this project. The results of the project can be found at https://github.com/dcgerard/NSF-U-Statistics.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Gerard, David "Bayesian tests for random mating in polyploids" Molecular Ecology Resources , v.23 , 2023 https://doi.org/10.1111/1755-0998.13856 Citation Details

Gerard, David "Comment on three papers about HardyWeinberg equilibrium tests in autopolyploids" Frontiers in Genetics , v.13 , 2022 https://doi.org/10.3389/fgene.2022.1027209 Citation Details

Gerard, David "Double Reduction Estimation and Equilibrium Tests in Natural Autopolyploid Populations" Biometrics , v.79 , 2022 https://doi.org/10.1111/biom.13722 Citation Details

Gerard, David and Thakkar, Mira and Ferrão, Luis_Felipe_V "Tests for segregation distortion in tetraploid F1 populations" Theoretical and Applied Genetics , v.138 , 2025 https://doi.org/10.1007/s00122-025-04816-z Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project developed new models to estimate and assess the genotype frequencies of polyploid populations. Polyploids are organisms that have more than two sets of their genomes. Unlike humans, who are diploids (having two sets), many organisms are polyploids, including important agricultural crops like blueberries, potatoes, and strawberries. A crucial component in many agricultural and evolutionary studies involving polyploids is the variation of DNA from one organism to another. For instance, at a locus on the genome of a tetraploid (which has four sets), some individuals might have four adenine (A) copies and zero cytosine (C) copies, some might have three A's and one C, others two A's and two C's, and so forth. The proportions of individuals with each DNA composition are known as genotype frequencies.

This project yielded three publications that developed methods to test various assumptions about these genotype frequencies. This project devised tests to determine whether a population is in Hardy-Weinberg equilibrium, a common assumption in many evolutionary studies. This project created tests to check if a population is panmictic (randomly mating), typical in many natural populations. This project also developed tests to assess if a population adheres to the law of Mendelian segregation (tests for segregation distortion), a frequent query in agricultural breeding experiments. These are all tests of the assumptions of the genotype frequencies. The publications also presented strategies to estimate genotype frequencies and related population genetic parameters based on different assumptions about their forms. A fourth publication clarified some confusions in the literature regarding genotype frequencies.

This project developed novel strategies to implement our testing procedures. Our tests employ a novel approach based on constructing a statistic, called a U-statistic, which is zero on average. This statistic is a function of the genotype frequencies under various assumptions, such as the polyploids' meiosis form. Since polyploids, depending on their evolutionary history, have different meiosis models, this project developed a novel meiosis model in tetraploids, linking different polyploid types. Our U-statistic approach is distinctive in population genetics, where many methods adopt a likelihood approach based on a generative model for observed genomes. This project also explored using this U-statistic method to study population structure, which examines how some genomes in a population are more similar than others.

Our methods have broader applications to agriculture and evolution. Many plant breeding programs aimed at the agricultural improvement of polyploid crops use genomic approaches to identify locations on the genome associated with desired traits, such as yield and hardiness. Numerous evolutionary genomic studies involve polyploid organisms, as all modern angiosperms (flowering plants) evolved from ancient polyploids. However, genomic data can be complex and prone to corruption by aspects of the biological assays designed to measure it. Our tests can serve as a quality control measure to identify problematic genome locations resulting from data corruption. The Hardy-Weinberg equilibrium tests can verify if many evolutionary methods' assumptions are valid. The random mating tests can highlight issues at certain genome locations in natural populations, often used in evolutionary studies, where random mating frequently occurs. The segregation distortion tests can assist breeding programs in identifying locations not following Mendelian segregation. Therefore, our tests can be integrated into quality control procedures to enhance evolutionary and agricultural studies, potentially advancing our understanding of plant evolution and improving crop quality. The easy-to-use software developed during this project, the hwep and menbayes R packages, facilitates access to our tests for scientists and plant breeders.

This project also funded educational initiatives, producing publicly available lecture materials for two courses. One course focused on advanced programming using the R statistical language, emphasizing genomic analysis methods in R. The other course covered introductory statistical genetic analysis at the graduate and upper undergraduate levels. Twelve student researchers funded by this project learned the fundamentals of statistical genetics and computational biology, with two writing master's theses on polyploid genetics.

Last Modified: 03/28/2024
Modified by: David Gerard

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error