
NSF Org: |
RISE Integrative and Collaborative Education and Research (ICER) |
Recipient: |
|
Initial Amendment Date: | August 31, 2016 |
Latest Amendment Date: | August 31, 2016 |
Award Number: | 1639714 |
Award Instrument: | Standard Grant |
Program Manager: |
Eva Zanzerkia
RISE Integrative and Collaborative Education and Research (ICER) GEO Directorate for Geosciences |
Start Date: | September 1, 2016 |
End Date: | August 31, 2019 (Estimated) |
Total Intended Award Amount: | $499,595.00 |
Total Awarded Amount to Date: | $499,595.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
266 WOODS HOLE RD WOODS HOLE MA US 02543-1535 (508)289-3542 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
266 Woods Hole Road Woods Hole MA US 02543-0151 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | EarthCube |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.050 |
ABSTRACT
The measurement of the biological inventory of proteins within an organism, known as proteomics, has emerged as an important new biological methodology within the past decade. When this new technology is applied to environmental communities of microbes, typically called metaproteomics for its inclusion of a biological community, it has shown potential to significantly improve the understanding of ocean ecology and biogeochemistry by allowing a broad diagnosis of ecosystems across space and time. In this manner the measurement of proteins and the enzymes that catalyze throughout the ocean basins has great potential as a tool for ocean scientists interested in the chemistry and biology of the oceans. Yet being a relatively new data type, based in mass spectra rather than DNA sequence, proteomic datasets have their own specific informatics complexities. Currently these datasets are not easily accessed by the broader biological and chemical oceanographic communities. The researchers will develop an Ocean Protein Portal that will enable non-expert users to interrogate these large and complex ocean protein datasets.
The ability to connect protein distributions in the oceans, with their implied chemical functionality, with chemical and biological ocean datasets has the potential to enable a broad array of microbial and biogeochemical discovery.The proposed cyberinfrastructure will benefit the broader biological and chemical oceanographic communities through making microbial protein data widely accessible and enabling connections between biogeochemical data and the enzymes that catalyze their reactions. This foundational infrastructure would be accessible to life scientists from other (non-ocean) domains as well. The portal will also contribute to the interpretation of GEOTRACES trace metal and isotope ocean full depth ocean via the incorporation of future protein datasets and linkages to the chemical datasets in BCO-DMO. This project will provide BCO-DMO with an opportunity to research and employ new techniques for efficient data mining combined with semantic technologies that will enable better data discovery and access for its community. In addition, working closely with the proteomics community will allow BCO-DMO data managers to gain working experience with this emerging data type.Specific goals include: 1)
creating text-based and sequence-based search capability of processed protein datasets, 2) creating a geospatial global map visualization as well as table output of the query protein?s occurrence in the oceans, 3) providing an Ocean Data View compatible table export of geospatial and temporal distributions of the queried protein, 4) providing an analytic capability to answer the question the taxonomic origin of the protein components, building on our METATRYP Python software and including a lowest common ancestor analysis for each peptide component of the queried protein sequence, 5) creating linkages between protein datasets and relevant environmental datasets within BCO-DMO, and 6) creating a repository for processed and raw ocean protein data.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This EarthCube project succesfully created an Ocean Protein Portal and conducted community building efforts. Proteins are the machines that life uses to conduct essential chemical reactions (enzymes), form cellular structural components, and control cellular processes (regulatory controls). In the ocean environment, microbial proteins, measured by the new 'omics approach metaproteomics, have been demonstrated to be allow assessment of controls on primary productivity and to directly examine biogeochemical reactions. From this domain science perspective, the ability to connect protein distributions in the oceans,with their implied chemical functionality, with chemical and biological ocean datasets has the potential to enable a broad array of microbial and biogeochemical discovery. The developed cyberinfrastructure will benefit the broader biological and chemical oceanographic communities through making microbial protein data widely accessible and enabling connections between biogeochemical data and the enzymes that catalyze their reactions.
The Ocean Protein Portal environment was assembling a variety of software and informatic development approaches (Version 1.1), and allows realtime interrogation for the fundamental questions of "where" is a protein of interest in the oceans and "who" is responsible for its synthesis. In addition, a variety of links to other information sources is embedded within each protein result. These include an ability to search the protein sequence within the BLASTP server at NIH's NCBI resource with a single click, searching Protein Data Bank (PDB) for available three dimensional structures of the protein, connecting to the European Bioninformatic Institute's (EBI) UniProt database, connecting to BCO-DMO for additional environmental datasets and expedition metadata, and to Metatryp's standalone instance for further taxonomic analysis. The various aspects of this informatics development have been described in a number of conferences and publications.
As a result of these activities, the OPP prototype is now operational having ingested and serving 8 metaproteomic datasets from the Atlantic, Pacific, Antarctic and Arctic Oceans for a total of 219 samples containing 106,995 proteins and 1,577,368 peptides altogether (Figure 1; oceanproteinportal.org). Several more metaproteomic datasets arein the ingestion queue. Similarly, least common analysis software METATRYP (Version2) is operational as a standalone tool and also is connected to the OPP via an API, andcontains a total of 71,574,958 peptides within the database from 145 genomes, 3metagenomes, and 956 MAGs/SAGs to date (contributing 58, 8 and 41 million peptides, respectively).
Last Modified: 12/29/2019
Modified by: Mak A Saito
Please report errors in award information by writing to: awardsearch@nsf.gov.