Award Abstract # 1947257
CRII: III: RUI: Association Testing and Inversion Detection without Reference Genomes

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: MILWAUKEE SCHOOL OF ENGINEERING
Initial Amendment Date: June 11, 2020
Latest Amendment Date: April 24, 2024
Award Number: 1947257
Award Instrument: Standard Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2020
End Date: August 31, 2024 (Estimated)
Total Intended Award Amount: $174,232.00
Total Awarded Amount to Date: $235,432.00
Funds Obligated to Date: FY 2020 = $174,232.00
FY 2021 = $12,600.00

FY 2022 = $12,600.00

FY 2023 = $24,000.00

FY 2024 = $12,000.00
History of Investigator:
  • Ronald Nowling (Principal Investigator)
    nowling@msoe.edu
Recipient Sponsored Research Office: Milwaukee School of Engineering
1025 N BROADWAY
MILWAUKEE
WI  US  53202-3109
(414)277-7300
Sponsor Congressional District: 04
Primary Place of Performance: Milwaukee School of Engineering
1025 N. Broadway
Milwaukee
WI  US  53202-3109
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): M6RCJVHKTHJ5
Parent UEI:
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
01002324DB NSF RESEARCH & RELATED ACTIVIT

01002425DB NSF RESEARCH & RELATED ACTIVIT

01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7364, 8228, 9229, 9251
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The last decade has seen the cost of DNA sequencing plummet. Consequently, the potential to sequence the genome of every living organism is within our grasp. Genome assemblies often require significant "polishing," which is often a manual and labor-intensive process. New methods are needed to directly analyze fragmented or unassembled genomic data. Analysis of these genomes include the identification of physical rearrangements such as inversions. Large inversions have significant impacts on the biology of organisms and their evolution. Existing computational methods for identifying inversions have been primarily tested on and developed for well-studied, "reference" genomes. This project seeks to develop new inversion detection and association testing methods suitable for the large and growing number of fragmented and/or unassembled genomes that are becoming available. Undergraduate research assistants will be funded as active collaborators on the project.

So-called "k-mer" methods have become popular in the last decade for the analysis of unassembled genomics or metagenomics data. This project seeks to utilize k-mers, unsupervised learning, and association testing to identify inversions in fragmented or poorly assembled population genomics data. Since millions of association tests will be run per data set, the methods will be accelerated using GPUs. The resulting method and software will be developed in conjunction with undergraduate research assistants and released under an open-source license.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Holm, Inge and Nardini, Luisa and Pain, Adrien and Bischoff, Emmanuel and Anderson, Cameron E. and Zongo, Soumanaba and Guelbeogo, Wamdaogo M. and Sagnon, NFale and Gohl, Daryl M. and Nowling, Ronald J. and Vernick, Kenneth D. and Riehle, Michelle M. "Comprehensive Genomic Discovery of Non-Coding Transcriptional Enhancers in the African Malaria Vector Anopheles coluzzii" Frontiers in Genetics , v.12 , 2022 https://doi.org/10.3389/fgene.2021.785934 Citation Details
Nowling, Ronald J. and Beal, Christopher R. and Emrich, Scott and Behura, Susanta K. and Halfon, Marc S. and Duman-Scheel, Molly "PeakMatcher: Matching Peaks Across Genome Assemblies" 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics , 2020 https://doi.org/10.1145/3388440.3414907 Citation Details
Nowling, Ronald J. and Behura, Susanta K. and Halfon, Marc S. and Emrich, Scott J. and Duman-Scheel, Molly "PeakMatcher facilitates updated Aedes aegypti embryonic cis-regulatory element map" Hereditas , v.158 , 2021 https://doi.org/10.1186/s41065-021-00172-2 Citation Details
Nowling, Ronald J. and Fallas-Moya, Fabian and Sadovnik, Amir and Emrich, Scott and Aleck, Matthew and Leskiewicz, Daniel and Peters, John G. "Fast, low-memory detection and localization of large, polymorphic inversions from SNPs" PeerJ , v.10 , 2022 https://doi.org/10.7717/peerj.12831 Citation Details
Nowling, Ronald J. and Geromel, Rafael Reple and Halligan, Benjamin "Filtering STARR-Seq Peaks for Enhancers with Sequence Models" 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics , 2020 https://doi.org/10.1145/3388440.3414905 Citation Details
Nowling, Ronald J. and Keyser, Samuel H. and Moran, Alex R. and Peters, John G. and Leskiewicz, Daniel "Segmenting and Genotyping Large, Polymorphic Inversions" 2023 IEEE International Conference on Electro Information Technology (eIT) , 2023 https://doi.org/10.1109/eIT57321.2023.10187331 Citation Details
Nowling, Ronald J. and Manke, Krystal R. and Emrich, Scott J. "Detecting inversions with PCA in the presence of population structure" PLOS ONE , v.15 , 2020 https://doi.org/10.1371/journal.pone.0240429 Citation Details
Nowling, Ronald J. and Njoya, Kimani and Peters, John G. and Riehle, Michelle M. "Prediction accuracy of regulatory elements from sequence varies by functional sequencing technique" Frontiers in Cellular and Infection Microbiology , v.13 , 2023 https://doi.org/10.3389/fcimb.2023.1182567 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The main goals of the project were to develop methods to (1) identify large inversions using k-mers in unassembled sequencing data, (2) develop efficient (computation time, memory usage) algorithms for machine learning with k-mers, and (3) involve undergraduate students in research.
We developed and evaluated several methods for unsupervised detection of inversions from variant data.  Particular improvements include the use of statistical methods to find variants associated with PCs, plot their associations along the chromosome as volcano plots, and characterize patterns associated inversions versus other factors for more confident detection of inversions.  Window / boundary detection methods were developed to localize inversions.  Clustering methods and scoring methods for assessing the quality of clusterings were evaluated to determine which clustering methods and scores were most accurate for genotyping samples by inversion karyotype.  Feature hashing and sketching methods were used to substantially reduce memory usage to enable large data sets to be analyzed on common desktop computers.
In addition, the grant funds were used for directly providing research opportunities to 9 undergraduate students and indirectly for an additional 4. Two of these students participated in further external REU programs at UW Madison and the Medical College of Wisconsin.  One of those students is now a PhD student in a program in biomedical data science, and the other student is currently applying to PhD programs.  A majority of the other undergraduate students have obtained data science positions in industry where they are applying data science and machine learning skills they developed through engagement with the project


Last Modified: 02/11/2025
Modified by: Ronald James Nowling

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page