
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | September 17, 2019 |
Latest Amendment Date: | October 15, 2020 |
Award Number: | 1940233 |
Award Instrument: | Continuing Grant |
Program Manager: |
Reed Beaman
OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2019 |
End Date: | December 31, 2022 (Estimated) |
Total Intended Award Amount: | $245,818.00 |
Total Awarded Amount to Date: | $245,818.00 |
Funds Obligated to Date: |
FY 2020 = $121,964.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
3141 CHESTNUT ST PHILADELPHIA PA US 19104-2875 (215)895-6342 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1505 Race St, 10th Floor Philadelphia PA US 19102-1119 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
HDR-Harnessing the Data Revolu, CYBERINFRASTRUCTURE |
Primary Program Source: |
01001920DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Unlike genetic data, the traits of organisms such as their visible features, are not available in databases for analysis. The lack of machine-readable trait data has slowed progress on four grand challenge problems in biology: predicting the genes that generate traits, understanding the patterns of evolution, predicting the effects of ecological change, and species identification. This project will use advances in machine learning and machine-readable biological knowledge to create a new method to automatically identify traits from images of organisms. Images of organisms are widely available, and this new method could be used to rapidly harvest traits that could be used to solve the grand challenges in biology. Large image collections and corresponding digital data from fishes will be used in this study because of the extensive resources available for these organisms. The new machine learning model can be generalized to other disciplines that have similar machine-readable knowledge, and it will help in explaining the results of artificial intelligence, thus advancing the field of computer science. The new method stands to benefit society in application to areas such as agriculture or medicine, where trait discovery from images is critical in disease diagnosis. The project will support the education of students and postdocs in biology, computer science, and information science. It will disseminate its findings through workshops, presentations, publications, and open access to data and code that it produces.
This project will leverage advances in state-of-the-art machine learning to develop a novel class of artificial neural networks that can exploit the machine readable and predictive knowledge about biology that is available in the form of phylogenies and anatomy ontologies. These biology-guided neural networks are expected to automatically detect and predict traits from specimen images, with little training data. Image-based trait data derived from this work will enable progress in gene-phenotype mapping to novel traits and understanding patterns of evolution. The resulting machine learning model can be generalized to other disciplines that have formally structured knowledge, and will contribute to advances in computer science by going beyond black-box learning and making important advances toward Explainable Artificial Intelligence. It may be extended to applied areas, such as agriculture or the biomedical domain. The research will be piloted using teleost fishes because of many high-quality data resources (digital images, evolutionary trees, anatomy ontology). Methods for automated metadata quality assessment and provenance tracking will be developed in the course of this project to ensure the results and processes are verifiable, replicable and reusable. These will broadly impact the many domains that will adopt machine learning as a way to make discoveries from images. This convergent research will accelerate scientific discovery across the biological sciences and computer science by harnessing the data revolution in conjunction with biological knowledge.
This project is part of the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea activity, and is jointly supported by the HDR and the Division of Biological Infrastructure within the NSF Directorate of Directorate for Biological Sciences.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The NSF-HDR Biology Guided Neural Networks (BGNN) project pursued the development of a novel class of artificial neural networks (ANNs) for classifying fish species by extracting morphological features (traits) data from digital images of specimens. Structured biological knowledge systems, such as the trait anatomy ontology and phylogeny about the fish species, were also used. The BGNN project advanced methods for manually and automatically assessing image quality. Drexel University’s Metadata Research Center (MRC) team worked specifically this aim by advancing machine-driven approaches for generating metadata about image quality. The MRC team also led the development of computational approaches for validating and improving existing specimen metadata, generating new metadata, and designing efficient metadata workflows.
BGNN’s metadata innovations, including the machine-driven metadata workflows, have contributed to the BGNN’s creation of a large, open-access repository of fish specimen images with rich metadata, hosted by Tulane University. Team members also collaborated on methods for semantically integrating discovered traits into existing knowledgebases and automatically tracking object history (provenance). BGNN model’s application to vast volumes of unlabeled fish specimen images has led to biological knowledge discovery, the identification of new ontological relationships among known traits of the fish specimens, and completion of an accurate trait distributions across multiple fish species.
The BGNN project collaboration involved biologists, computer scientist, and information/data scientists. The project’s methods and findings contribute to the foundation of new HDR Imageomics Institute. Moreover, many of the approaches and workflows are applicable to a broader range or scientific disciplines that seek to apply algorithms and AI approaches to phenotypic trait data to study, classify, and identify novel aspect of biological specimens. Additional Drexel MRC’s contributions and impacts include: 9 scientific publications—among which 2 were award winning research papers, several sets of published data and code, and 19 presentations delivered at international and national conferences, academic/educative seminars, professional committee meetings. Drexel MRC’s BGNN engagement provided cross-disciplinary research education experience for 3 doctoral students (1 in Computer Science (CS) and 2 Information Science), 1 Master’s in Information Systems (MIS) student, 2 CS undergraduate students, and an early-career information professional as a LEADING fellow. Additionally, while the project has been officially concluded, a CS undergraduate continues to contribute image informatics and metadata expertise as part of his senior design project. Overall, the cross-domain research training supported by this grant, together with Drexel Metadata Research Center’s expertise, allowed for computational and informatics approaches to address biological research questions and needs.
Last Modified: 05/22/2023
Modified by: Jane Greenberg
Please report errors in award information by writing to: awardsearch@nsf.gov.