NSF Award Search: Award # 1940233

Award Abstract # 1940233

Biology-guided Neural Networks for Discovering Phenotypic Traits

NSF Org:	OAC Office of Advanced Cyberinfrastructure (OAC)
Recipient:	DREXEL UNIVERSITY
Initial Amendment Date:	September 17, 2019
Latest Amendment Date:	October 15, 2020
Award Number:	1940233
Award Instrument:	Continuing Grant
Program Manager:	Reed Beaman OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering
Start Date:	October 1, 2019
End Date:	December 31, 2022 (Estimated)
Total Intended Award Amount:	$245,818.00
Total Awarded Amount to Date:	$245,818.00
Funds Obligated to Date:	FY 2019 = $123,854.00 FY 2020 = $121,964.00
History of Investigator:	Jane Greenberg (Principal Investigator) janeg@drexel.edu
Recipient Sponsored Research Office:	Drexel University 3141 CHESTNUT ST PHILADELPHIA PA US 19104-2875 (215)895-6342
Sponsor Congressional District:	03
Primary Place of Performance:	Drexel University 1505 Race St, 10th Floor Philadelphia PA US 19102-1119
Primary Place of Performance Congressional District:	03
Unique Entity Identifier (UEI):	XF3XM9642N96
Parent UEI:
NSF Program(s):	HDR-Harnessing the Data Revolu, CYBERINFRASTRUCTURE
Primary Program Source:	01002021DB NSF RESEARCH & RELATED ACTIVIT 01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7231, 1165
Program Element Code(s):	099y00, 723100
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Unlike genetic data, the traits of organisms such as their visible features, are not available in databases for analysis. The lack of machine-readable trait data has slowed progress on four grand challenge problems in biology: predicting the genes that generate traits, understanding the patterns of evolution, predicting the effects of ecological change, and species identification. This project will use advances in machine learning and machine-readable biological knowledge to create a new method to automatically identify traits from images of organisms. Images of organisms are widely available, and this new method could be used to rapidly harvest traits that could be used to solve the grand challenges in biology. Large image collections and corresponding digital data from fishes will be used in this study because of the extensive resources available for these organisms. The new machine learning model can be generalized to other disciplines that have similar machine-readable knowledge, and it will help in explaining the results of artificial intelligence, thus advancing the field of computer science. The new method stands to benefit society in application to areas such as agriculture or medicine, where trait discovery from images is critical in disease diagnosis. The project will support the education of students and postdocs in biology, computer science, and information science. It will disseminate its findings through workshops, presentations, publications, and open access to data and code that it produces.

This project will leverage advances in state-of-the-art machine learning to develop a novel class of artificial neural networks that can exploit the machine readable and predictive knowledge about biology that is available in the form of phylogenies and anatomy ontologies. These biology-guided neural networks are expected to automatically detect and predict traits from specimen images, with little training data. Image-based trait data derived from this work will enable progress in gene-phenotype mapping to novel traits and understanding patterns of evolution. The resulting machine learning model can be generalized to other disciplines that have formally structured knowledge, and will contribute to advances in computer science by going beyond black-box learning and making important advances toward Explainable Artificial Intelligence. It may be extended to applied areas, such as agriculture or the biomedical domain. The research will be piloted using teleost fishes because of many high-quality data resources (digital images, evolutionary trees, anatomy ontology). Methods for automated metadata quality assessment and provenance tracking will be developed in the course of this project to ensure the results and processes are verifiable, replicable and reusable. These will broadly impact the many domains that will adopt machine learning as a way to make discoveries from images. This convergent research will accelerate scientific discovery across the biological sciences and computer science by harnessing the data revolution in conjunction with biological knowledge.

This project is part of the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea activity, and is jointly supported by the HDR and the Division of Biological Infrastructure within the NSF Directorate of Directorate for Biological Sciences.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Jebbia, D. "Toward a Flexible Metadata Pipeline for Fish Specimen Images. To be published:" Proceedings for the 16th Metadata and Semantic Research (MTSR), Springer in Communications in Computer and Information Science , 2022 Citation Details

Karnani, Kevin and Pepper, Joel and Baki, Yasin and Wang, Xiaojun and Bart, Henry and Breen, David E. and Greenberg, Jane "Computational metadata generation methods for biological specimen image collections" International Journal on Digital Libraries , v.23 , 2022 https://doi.org/10.1007/s00799-022-00342-1 Citation Details

Leipzig, J. and Bakis, Y and Wang, X and Elhamod, M. and Diamond, K. and Dahdul, W and Karpante, A and Maga, M and Mabee, P and Bart, H and Greenberg, J. "Biodiversity Image Quality Metadata Augments Convolutional Neural Network Classification of Fish Species" Metadata and Semantic Research. MTSR 2020. Communications in Computer and Information Science , 2021 Citation Details

Pepper, Joel and Greenberg, Jane and Bakis, Yasin and Wang, Xiaojun and Bart, Henry and Breen, David "Automatic Metadata Generation for Fish Specimen Image Collections" The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings/2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) , 2021 https://doi.org/10.1109/JCDL52503.2021.00015 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The NSF-HDR Biology Guided Neural Networks (BGNN) project pursued the development of a novel class of artificial neural networks (ANNs) for classifying fish species by extracting morphological features (traits) data from digital images of specimens. Structured biological knowledge systems, such as the trait anatomy ontology and phylogeny about the fish species, were also used. The BGNN project advanced methods for manually and automatically assessing image quality. Drexel University’s Metadata Research Center (MRC) team worked specifically this aim by advancing machine-driven approaches for generating metadata about image quality. The MRC team also led the development of computational approaches for validating and improving existing specimen metadata, generating new metadata, and designing efficient metadata workflows.

BGNN’s metadata innovations, including the machine-driven metadata workflows, have contributed to the BGNN’s creation of a large, open-access repository of fish specimen images with rich metadata, hosted by Tulane University. Team members also collaborated on methods for semantically integrating discovered traits into existing knowledgebases and automatically tracking object history (provenance). BGNN model’s application to vast volumes of unlabeled fish specimen images has led to biological knowledge discovery, the identification of new ontological relationships among known traits of the fish specimens, and completion of an accurate trait distributions across multiple fish species.

The BGNN project collaboration involved biologists, computer scientist, and information/data scientists. The project’s methods and findings contribute to the foundation of new HDR Imageomics Institute. Moreover, many of the approaches and workflows are applicable to a broader range or scientific disciplines that seek to apply algorithms and AI approaches to phenotypic trait data to study, classify, and identify novel aspect of biological specimens. Additional Drexel MRC’s contributions and impacts include: 9 scientific publications—among which 2 were award winning research papers, several sets of published data and code, and 19 presentations delivered at international and national conferences, academic/educative seminars, professional committee meetings. Drexel MRC’s BGNN engagement provided cross-disciplinary research education experience for 3 doctoral students (1 in Computer Science (CS) and 2 Information Science), 1 Master’s in Information Systems (MIS) student, 2 CS undergraduate students, and an early-career information professional as a LEADING fellow. Additionally, while the project has been officially concluded, a CS undergraduate continues to contribute image informatics and metadata expertise as part of his senior design project. Overall, the cross-domain research training supported by this grant, together with Drexel Metadata Research Center’s expertise, allowed for computational and informatics approaches to address biological research questions and needs.

Last Modified: 05/22/2023
Modified by: Jane Greenberg

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error