Award Abstract # 1458359
Bilateral BBSRC-NSF/BIO Collaborative Research: ABI Development: A Critical Assessment of Protein Function Annotation

NSF Org: DBI
Division of Biological Infrastructure
Recipient: IOWA STATE UNIVERSITY OF SCIENCE AND TECHNOLOGY
Initial Amendment Date: August 19, 2015
Latest Amendment Date: August 4, 2020
Award Number: 1458359
Award Instrument: Standard Grant
Program Manager: Peter McCartney
DBI
 Division of Biological Infrastructure
BIO
 Directorate for Biological Sciences
Start Date: September 1, 2015
End Date: August 31, 2021 (Estimated)
Total Intended Award Amount: $506,490.00
Total Awarded Amount to Date: $506,490.00
Funds Obligated to Date: FY 2015 = $506,490.00
History of Investigator:
  • Iddo Friedberg (Principal Investigator)
    idoerg@gmail.com
Recipient Sponsored Research Office: Iowa State University
1350 BEARDSHEAR HALL
AMES
IA  US  50011-2103
(515)294-5225
Sponsor Congressional District: 04
Primary Place of Performance: Iowa State University of Science and Technology
1138 Pearson Hall
Ames
IA  US  50011-2207
Primary Place of Performance
Congressional District:
Unique Entity Identifier (UEI): DQDBM7FGJPC5
Parent UEI: DQDBM7FGJPC5
NSF Program(s): ADVANCES IN BIO INFORMATICS
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 116500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.074

ABSTRACT

Biologists are deluged with sequence data yet have derived comparatively little biological information from it. The accurate annotation of protein function is key to understanding life, but experimentally determining what each protein does is costly and difficult, and cannot scale up to accommodate the vast amount of sequence data already available. Therefore discovering protein protein function by computational, rather than experimental means, is of primary importance. Genomic sequence data are available from thousands of species, and those are coupled with massive high-throughput experimental data. Together, these data have created new opportunities as well as challenges for computational function prediction. As a result, many computational annotation methods have been developed by research groups worldwide, but their accuracy and applicability need to be improved upon. The mission of the Automated Function Prediction Special Interest Group (AFP-SIG) is to bring together computational biologists, experimental biologists and biocurators who are dealing with the important problem of predicting protein function, to share ideas, and create collaborations. To improve computational function prediction methods, the Critical Assessment of protein Function Annotation algorithms (CAFA) was established as an ongoing experiment. CAFA was designed to provide a large-scale assessment of computational methods dedicated to predicting protein function. By challenging dozens of research groups worldwide to develop and provide their best software for function prediction, the researchers involved in the AFP-SIG will improve the ability of biologists to understand life at the molecular level. The AFP-SIG researchers will also generate experimental data from fruit-flies, fungi and bacteria to be used as benchmarks to test the software participating in CAFA, and a deeper understanding of these model organisms.

It is now possible to collect data that comprehensively profile many different states of complex biological systems. Using these data it should be possible to understand and explain the underlying systems, but significant challenges remain. One of the primary challenges is that, as researchers collect more data from many different organisms in many different systems, they discover more and different genes. Assigning functions to these newly discovered genes represents a key step towards interpretation of high-throughput data. This leads to a critical need to assess the quality of the function prediction methods that researchers have developed in recent years. The mission of the Automated Function Prediction Special Interest Group (AFP-SIG), founded in 2005, is to bring together bioinformaticians and biologists who are addressing this key challenge of gene function prediction. In addition to sharing ideas and creating collaboration, AFP-SIG has created CAFA: the Critical Assessment of (protein) Function Annotation. CAFA is a community-driven challenge to assess the performance of protein function prediction software, and it has been carried out twice since 2010. The investigators will provide the following outcomes: (1) robust open-source software to be used in function prediction and assessment of function prediction methods, incorporated into the high-profile annotation pipelines of UniProt-GOA; (2) expansion of the AFP community by engaging bioinformaticians, biocurators and experimentalists, thereby improving the quality and relevance of function prediction methods; (3) large-scale experimental screens in Drosophila, Candida and Pseudomonas for novel associations of targeted functional terms with genes; (4) an expanded CAFA event, incorporating both the curated annotations from the literature and our own experimental screens, in the last two years of the project. The progress of the AFP-SIG and CAFA will be available from http://BioFunctionPrediction.org

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 12)
Xiao Hu, Iddo Friedberg "SwiftOrtho: a Fast, Memory-Efficient, Multiple Genome Orthology Classifier" GigaScience , 2019
(169 in total)NaihuiZhouYuxiangJiangTimothyRBergquistAlexandraJLeeBalintZ KacsohAlexWCrockerKimberleyALewisGeorgeGeorghiouHuyNNguyenMd NafizHamidLarryDavisTuncaDoganVolkanAtalayAhmetSRifaiogluAlperenDalkranRe ""The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens"" Genome Biology , 2019
Balint Z Kacsoh, Stephen Barton, Yuxiang Jiang, Naihui Zhou, Sean D Mooney, Iddo Friedberg, Predrag Radivojac, Casey S Greene, Giovanni Bosco "New Drosophila long-term memory genes revealed by assessing computational function prediction methods." G3 , v.9 , 2019 https://doi.org/10.1534/g3.118.200867
Francisco R Fields, Stephan D Freed, Katelyn E Carothers, Md Nafiz Hamid, Daniel E Hammers, Jessica N Ross, Veronica R Kalwajtys, Alejandro J Gonzalez, Andrew D Hildreth, Iddo Friedberg, Shaun W Lee "Novel antimicrobial peptide discovery using machine learning and biophysical selection of minimal bacteriocin domains" Drug Development Research , 2019 10.1002/ddr.21601
Hosein Mohimani, Alexey Gurevich, Kelsey L Alexander, C Benjamin Naman, Tiago Leao, Evgenia Glukhov, Nathan A Moss, Tal Luzzatto Knaan, Fernando Vargas, Louis-Felix Nothias, Nitin K Singh, John G Sanders, Rodolfo AS Benitez, Luke R Thompson, Md Nafiz Hami "MetaMiner: A Peptidogenomics Approach for the Discovery of Ribosomally Synthesized and Post-translationally Modified Peptides from Microbial Communities" Cell Systems , 2019
Kokulapalan Wimalanathan, Iddo Friedberg, Carson M Andorf, Carolyn J Lawrence?Dill "Maize GO Annotation?Methods, Evaluation, and Review (maize?GAMER)" Plant Direct , v.2 , 2018 , p.e00052 https://doi.org/10.1002/pld3.52
Kyoung Tak Cho, Jack M Gardiner, Lisa C Harper, Carolyn J Lawrence-Dill, Iddo Friedberg, Carson M Andorf "MaizeDIG: Maize Database of Images and Genomes" Frontiers in Plant Science , v.10 , 2019 , p.1050 10.3389/fpls.2019.01050
Md Nafiz Hamid, Iddo Friedberg "Identifying Antimicrobial Peptides using Word Embedding with Deep Recurrent Neural Networks" Bioinformatics , v.35 , 2019 https://doi.org/10.1093/bioinformatics/bty937
Naihui Zhou, Zachary D Siegel, Scott Zarecor, Nigel Lee, Darwin A Campbell, Carson M Andorf, Dan Nettleton, Carolyn J Lawrence-Dill, Baskar Ganapathysubramanian, Jonathan W Kelly, Iddo Friedberg "Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning" PLoS computational biology , v.14 , 2018 , p.e1006337 https://doi.org/10.1371/journal.pcbi.1006337
Moon J, Friedberg I and Eulenstein O "Highly Bi-Connected Subgraphs for computational protein function annotation" The 22nd International Computing and Combinatorics Conference (COCOON) , 2016 , p.573 10.1007/978-3-319-42634-1_46
Ryan S Nett, Huy Nguyen, Raimund Nagel, Ariana Marcassa, Trevor C Charles, Iddo Friedberg, Reuben J Peters "Unraveling a tangled skein: Evolutionary analysis of the bacterial gibberellin biosynthetic operon" mSphere , 2020 10.1128/mSphere.00292-20
(Showing: 1 - 10 of 12)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Cheap and fast genome sequencing has given us a plethora of genomic data. But while we have the ability to sequence full genomes in a massive scale, understanding the functions of genes is still woefully lagging behind. This phenomenon is akin to a library acquiring dozens of new books each day, but the patrons are only able to comprehend 0-70% of the words in each book.
Understanding the meaning of the words, or the function of the genes, is one of the main challenges we are facing the in the genomic era. 

While experiments are the best way of understanding the function of a gene or gene product. However, experiments are laborious and consume time and resources, and in many cases are unnecessary. Given the  landscape of genes whose function has already been determined, computational methods can ostensibly be used to fill in the blanks. Our overarching goal was to incentivize the creation of an ecosystem of many different function prediction methods that will help advance the field of function prediction via a periodic competition, which we called the Critical Assessment of Function Annotation, or CAFA. CAFA competitions  have been a huge success, with a community of 68 groups worldwide developing dozens of methods competing in CAFA. From 59 methods in CAFA1, we have advanced to 144 methods competing in CAFA3, and an estimated 160 methods in CAFA4 (recently closed). 

INTELLECTUAL MERIT: the chief scientific advance this project provided is dramatically improving our ability, as a scientific community, to computationally predict protein function, and providing these tools to experimentalists for quickly creating falsifiable hypotheses that can be experimentally verified.  One of the metrics used to gauge method performance is  Fmax, which is normalized on a scale of 0-1.  Fmax for prediction of certain aspects of protein function has improved from an Fmax of 0.4 to 0.7. Furthermore, new machine learning based methods have been developed within the framework of CAFA for predicting protein function. It is unlikely that such a broad effort to develop so many new methods would have taken place without the incentive of the CAFA competition. Leveraging the framework of CAFA, we have also applied predictions to plant phenotypes, and explored the utility of classification employing citizen scientists and Amazon MTurkers to create training data for prediction algorithms. Another project involved predicting genes to be involved in long-term memory in fruit flies, and then verifying that they are indeed playing such a role using an mRNA knock-down assay. We have also developed a novel method for classifying antibiotic resistance genes using an adaptation of machine learning methods commonly used for document classification (Word2vec) with excellent accuracy, and intend to continue this line of research with experimental verifications. Additionally, 10 trainees have been involved in biocuration, and creating prediction targets from whole-genome experiments. 

 

BROADER IMPACTS: The main impact from this award was the initiation and growth of a large international community of computational function predictors, biocurators, and experimental biologists all working together under the umbrella of CAFA and the Function COSI (Community of Special Interest)  to improve computational function prediction methods. We estimate this community to consist of some 70 groups, with the mailing list containing 200 active members. Members of this community have been meeting annually in the Function COSI meeting to share and debate the merits of methods, assessment metrics, and best ways to improve and expand community activities in the field. 

The creation and growth of this community, and the resulting progress via the CAFA challenge have transformed the way the broader life-science community addresses computational function prediction. One illustrative example is that the journal Nucleic Acids Research now requires CAFA-like assessment results to publish function prediction methods. The three papers reporting the results of the three CAFA challenges have been collectively cited over 1,300 times, attesting to the high interest in the broader biological community.  A   literature survey provided us with an estimate of at least 70 trainees, graduate students (mostly) and postdocs who, in part or in whole, have been working on developing CAFA-competitive methods over the period of the funding of this grant. This is a huge impact not only on training, but on the ecosystem and availability of different software methods for computational function prediction.

 

In the PI's lab, funding from this award was used to train (in full or in part)  3 PhD students in the PI's lab. In addition, two programmers who were trained in bioinformatics, and one postdoc.  The PI also taught two undergraduate courses in computational genomics, and one graduate course in the same discipline, where he also introduced Gene Ontology and computational function prediction to the curriculum. 


Last Modified: 12/14/2021
Modified by: Iddo Friedberg

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page