Skip to feedback

Award Abstract # 1750532
CAREER: Using indel rate variation to understand evolutionary constraints on distances between functional elements in the genome

NSF Org: DBI
Division of Biological Infrastructure
Recipient: BOARD OF REGENTS OF NEVADA SYSTEM OF HIGHER EDUCATION
Initial Amendment Date: March 12, 2018
Latest Amendment Date: August 16, 2022
Award Number: 1750532
Award Instrument: Continuing Grant
Program Manager: Reed Beaman
rsbeaman@nsf.gov
 (703)292-7163
DBI
 Division of Biological Infrastructure
BIO
 Directorate for Biological Sciences
Start Date: July 1, 2018
End Date: June 30, 2024 (Estimated)
Total Intended Award Amount: $574,068.00
Total Awarded Amount to Date: $574,068.00
Funds Obligated to Date: FY 2018 = $127,320.00
FY 2019 = $108,934.00

FY 2020 = $111,495.00

FY 2021 = $112,594.00

FY 2022 = $113,725.00
History of Investigator:
  • Mira Han (Principal Investigator)
    mira.han@unlv.edu
Recipient Sponsored Research Office: University of Nevada Las Vegas
4505 S MARYLAND PKWY
LAS VEGAS
NV  US  89154-9900
(702)895-1357
Sponsor Congressional District: 01
Primary Place of Performance: University of Nevada Las Vegas
NV  US  89154-1055
Primary Place of Performance
Congressional District:
01
Unique Entity Identifier (UEI): DLUTVJJ15U66
Parent UEI: F995DBS4SRN3
NSF Program(s): ADVANCES IN BIO INFORMATICS
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
01001920DB NSF RESEARCH & RELATED ACTIVIT

01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT

01002223DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045, 1165, 9150
Program Element Code(s): 116500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.074

ABSTRACT

Adjacent protein domains interact to fold into a functional protein structure. Adjacent binding sites in the DNA interact with the multiprotein complex for efficient binding. Yet, both protein domains and clusters of binding sites are encoded as one-dimensional arrays in the genome. This project tests the hypothesis that, in order for specific and correct interactions to occur, there are optimal distances between these functional units and that they are maintained across evolutionary time. Since insertions and deletions of DNA (indels) change the distance between these functional units, the genome will be under evolutionary constraint against indel mutations that affect the distance. To test this hypothesis, the investigator will develop software to systematically estimate and compare the changes in distance between functional elements and statistical models to test the likelihood of the events observed. Upon project completion, the scientific community will have tools that identify distances that are conserved, and will be able to predict the effect and importance of indel mutations occurring in these genomic regions. The investigator will hold workshops for girls in grades 6-12 to develop software games that model the concept of evolutionary constraint. She will also develop undergraduate and graduate classes with hands-on activities on molecular evolution and computational sequence analysis.

The goal of this project is to utilize the variation in rates of indels to infer the evolutionary constraint on the distance between functional elements in the genome. Experimental evidence has been accumulating on selection against indels in the loops and linkers within proteins, or in the space between binding sites of regulatory elements. But, studies on the evolution of these sequences are almost nonexistent, due to the difficulty in aligning these sequences. This project addresses these challenges by applying methods that can model variable indel rates across sites or methods that model length instead of relying on alignments. Using these methods, the investigator will produce phylogenetic, and quantitative estimates of indel rates on a significant proportion of the genome that has been neglected so far. In Objective 1, using a new software the investigator has developed, variable site-specific indel rates will be estimated across loops between protein motifs to identify structural motifs with strong constraints on their distance. In Objective 2, the investigator will develop a new software based on birth-death processes to estimate indel rates without relying on alignments. Using this software, she will test the hypothesis that there is stronger constraint on the distance between tandem homologous domains, compared to non-homologous domains. In Objective 3, the investigator will use the software described above to test the hypothesis that there is stronger constraint on the distance between binding sites of homodimers, compared to the distance between binding sites of heterodimers. This study will integrate the knowledge gained in the fields of structural biology and developmental biology into a phylogenomic context, and provide tools for the community to test specific evolutionary hypotheses on distance between functional elements of interest. The results of the project will be presented at https://github.com/HanLabUNLV.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Barth, Dylan and Van, Richard and Cardwell, Jonathan and Han, Mira_V and Mathelier, ed., Anthony "Supervised learning of enhancerpromoter specificity based on genome-wide perturbation studies highlights areas for improvement in learning" Bioinformatics , v.40 , 2024 https://doi.org/10.1093/bioinformatics/btae367 Citation Details
Chung, Nicky M. and Jonaid, G. E. and Quinton, Sophia V. and Ross, Austin and Sexton, Corinne and Alberto, Adrian and Clymer, Cody and Churchill, Daphnie and Navarro Leija, Omar and Han, Mira "Transcriptome analyses of tumor-adjacent somatic tissues reveal genes co-expressed with transposable elements" Mobile DNA , v.10 , 2019 10.1186/s13100-019-0180-5 Citation Details
Sexton, Corinne E. and Han, Mira V. "Paired-end mappability of transposable elements in the human genome" Mobile DNA , v.10 , 2019 10.1186/s13100-019-0172-5 Citation Details
Sexton, Corinne E. and Tillett, Richard L. and Han, Mira V. "The essential but enigmatic regulatory role of HERVH in pluripotency" Trends in Genetics , v.38 , 2022 https://doi.org/10.1016/j.tig.2021.07.007 Citation Details
Sexton, Corinne E. and Victor Paul, Sylvia and Barth, Dylan and Han, Mira V. "Genome wide clustering on integrated chromatin states and Micro-C contacts reveals chromatin interaction signatures" NAR Genomics and Bioinformatics , v.6 , 2024 https://doi.org/10.1093/nargab/lqae136 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The project aimed to understand the evolutionary constraint on distances between elements in the genome. We especially focused on understanding the distance between regulatory elements in the genome.

Based on the accumulating evidence that 3D contact between promoters and enhancers are important in gene regulation, we built a supervised machine learning model to predict functional enhancer-promoter relationships based on genomic features. We found that both 3D chromatin contact and 1D genomic distance are the most important features in predicting functional enhancer-promoter pairs. In addition, we found many other genomic features that distinguish distal enhancer regulated promoters from self-sufficient promoters.

To understand how the chromatin landscape and chromatin contact change across cell-types, we built an unsupervised method to integrate the 1D chromatin annotation with 3D chromatin contact. Using this approach, we were able to summarize the types of chromatins that a genomic region is in contact with and find clusters of common contact patterns. It allowed us to observe that a genomic region can dynamically change the types of chromatin that it is in contact with, while maintaining its own chromatin mark.

In studying the regulatory regions, we found that a significant proportion of them originate from transposable elements, so we studied how transposable elements can be a source of regulatory elements. First, we showed that the majority of old transposable elements in the human genome have high locus level mappability. Then we studied the tissue specific expression of individual transposable elements and found that host(human) immune genes tend to show repressed expression in samples that have higher transposable element expression. We also reviewed the literature on HERV-H and its potential as a regulatory element in pluripotent stem cells.

For broader impact, we ran a summer coding camp every year for four years, with the exception of 2020. The summer coding camp was designed to introduce visual programming language Scratch to middle school girls. About 15~20 students participated every year, and developed various art or games using Scratch, including a game that simulated natural selection.

 


Last Modified: 12/11/2024
Modified by: Mira Han

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page