Award Abstract # 1639714
EarthCube Data Infrastructure: Laying the Groundwork for an Ocean Protein Portal

NSF Org: RISE
Integrative and Collaborative Education and Research (ICER)
Recipient: WOODS HOLE OCEANOGRAPHIC INSTITUTION
Initial Amendment Date: August 31, 2016
Latest Amendment Date: August 31, 2016
Award Number: 1639714
Award Instrument: Standard Grant
Program Manager: Eva Zanzerkia
RISE
 Integrative and Collaborative Education and Research (ICER)
GEO
 Directorate for Geosciences
Start Date: September 1, 2016
End Date: August 31, 2019 (Estimated)
Total Intended Award Amount: $499,595.00
Total Awarded Amount to Date: $499,595.00
Funds Obligated to Date: FY 2016 = $499,595.00
History of Investigator:
  • Mak Saito (Principal Investigator)
    msaito@whoi.edu
  • Danie Kinkade (Co-Principal Investigator)
Recipient Sponsored Research Office: Woods Hole Oceanographic Institution
266 WOODS HOLE RD
WOODS HOLE
MA  US  02543-1535
(508)289-3542
Sponsor Congressional District: 09
Primary Place of Performance: Woods Hole Oceanographic Institution
266 Woods Hole Road
Woods Hole
MA  US  02543-0151
Primary Place of Performance
Congressional District:
09
Unique Entity Identifier (UEI): GFKFBWG2TV98
Parent UEI:
NSF Program(s): EarthCube
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433
Program Element Code(s): 807400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.050

ABSTRACT

The measurement of the biological inventory of proteins within an organism, known as proteomics, has emerged as an important new biological methodology within the past decade. When this new technology is applied to environmental communities of microbes, typically called metaproteomics for its inclusion of a biological community, it has shown potential to significantly improve the understanding of ocean ecology and biogeochemistry by allowing a broad diagnosis of ecosystems across space and time. In this manner the measurement of proteins and the enzymes that catalyze throughout the ocean basins has great potential as a tool for ocean scientists interested in the chemistry and biology of the oceans. Yet being a relatively new data type, based in mass spectra rather than DNA sequence, proteomic datasets have their own specific informatics complexities. Currently these datasets are not easily accessed by the broader biological and chemical oceanographic communities. The researchers will develop an Ocean Protein Portal that will enable non-expert users to interrogate these large and complex ocean protein datasets.

The ability to connect protein distributions in the oceans, with their implied chemical functionality, with chemical and biological ocean datasets has the potential to enable a broad array of microbial and biogeochemical discovery.The proposed cyberinfrastructure will benefit the broader biological and chemical oceanographic communities through making microbial protein data widely accessible and enabling connections between biogeochemical data and the enzymes that catalyze their reactions. This foundational infrastructure would be accessible to life scientists from other (non-ocean) domains as well. The portal will also contribute to the interpretation of GEOTRACES trace metal and isotope ocean full depth ocean via the incorporation of future protein datasets and linkages to the chemical datasets in BCO-DMO. This project will provide BCO-DMO with an opportunity to research and employ new techniques for efficient data mining combined with semantic technologies that will enable better data discovery and access for its community. In addition, working closely with the proteomics community will allow BCO-DMO data managers to gain working experience with this emerging data type.Specific goals include: 1)
creating text-based and sequence-based search capability of processed protein datasets, 2) creating a geospatial global map visualization as well as table output of the query protein?s occurrence in the oceans, 3) providing an Ocean Data View compatible table export of geospatial and temporal distributions of the queried protein, 4) providing an analytic capability to answer the question the taxonomic origin of the protein components, building on our METATRYP Python software and including a lowest common ancestor analysis for each peptide component of the queried protein sequence, 5) creating linkages between protein datasets and relevant environmental datasets within BCO-DMO, and 6) creating a repository for processed and raw ocean protein data.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Held, Noelle A. and Webb, Eric A. and McIlvin, Matthew M. and Hutchins, David A. and Cohen, Natalie R. and Moran, Dawn M. and Kunde, Korinna and Lohan, Maeve C. and Mahaffey, Claire and Woodward, E. Malcolm and Saito, Mak A. "Co-occurrence of Fe and P stress in natural populations of the marine diazotroph <i>Trichodesmium</i>" Biogeosciences , v.17 , 2020 https://doi.org/10.5194/bg-17-2537-2020 Citation Details
Held, Noelle and Saunders, Jaclyn and Futrelle, Joe and Saito, Mak "Harnessing the Power of Scientific Python to Investigate Biogeochemistry and Metaproteomes of the Central Pacific Ocean" Proceedings of the Python in Science Conference , 2018 10.25080/Majora-4af1f417-010 Citation Details
Saito, Mak A. and Bertrand, Erin M. and Duffy, Megan E. and Gaylord, David A. and Held, Noelle A. and Hervey, William Judson and Hettich, Robert L. and Jagtap, Pratik and Janech, Michael G. and Kinkade, Danie B. and Leary, Dasha and McIlvin, Matthew and M "Progress and Challenges in Ocean Metaproteomics and Proposed Best Practices for Data Sharing" Journal of Proteome Research , 2019 10.1021/acs.jproteome.8b00761 Citation Details
Saito, Mak A. and Saunders, Jaclyn K. and Chagnon, Michael and Gaylord, David A. and Shepherd, Adam and Held, Noelle A. and Dupont, Christopher and Symmonds, Nicholas and York, Amber and Charron, Matthew and Kinkade, Danie B. "Development of an Ocean Protein Portal for Interactive Discovery and Education" Journal of Proteome Research , v.20 , 2021 https://doi.org/10.1021/acs.jproteome.0c00382 Citation Details
Saunders, Jaclyn K. and Gaylord, David A. and Held, Noelle A. and Symmonds, Nicholas and Dupont, Christopher L. and Shepherd, Adam and Kinkade, Danie B. and Saito, Mak A. "METATRYP v 2.0: Metaproteomic Least Common Ancestor Analysis for Taxonomic Inference Using Specialized Sequence AssembliesStandalone Software and Web Servers for Marine Microorganisms and Coronaviruses" Journal of Proteome Research , v.19 , 2020 https://doi.org/10.1021/acs.jproteome.0c00385 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This EarthCube project succesfully created an Ocean Protein Portal and conducted community building efforts. Proteins are the machines that life uses to conduct essential chemical reactions (enzymes), form cellular structural components, and control cellular processes (regulatory controls). In the ocean environment, microbial proteins, measured by the new 'omics approach metaproteomics, have been demonstrated to be allow assessment of controls on primary productivity and to directly examine biogeochemical reactions. From this domain science perspective, the ability to connect protein distributions in the oceans,with their implied chemical functionality, with chemical and biological ocean datasets has the potential to enable a broad array of microbial and biogeochemical discovery. The developed cyberinfrastructure will benefit the broader biological and chemical oceanographic communities through making microbial protein data widely accessible and enabling connections between biogeochemical data and the enzymes that catalyze their reactions.

The Ocean Protein Portal environment was assembling a variety of software and informatic development approaches (Version 1.1), and allows realtime interrogation for the fundamental questions of "where" is a protein of interest in the oceans and "who" is responsible for its synthesis. In addition, a variety of links to other information sources is embedded within each protein result. These include an ability to search the protein sequence within the BLASTP server at NIH's NCBI resource with a single click, searching Protein Data Bank (PDB) for available three dimensional structures of the protein, connecting to the European Bioninformatic Institute's (EBI) UniProt database, connecting to BCO-DMO for additional environmental datasets and expedition metadata, and to Metatryp's standalone instance for further taxonomic analysis. The various aspects of this informatics development have been described in a number of conferences and publications.

As a result of these activities, the OPP prototype is now operational having ingested and serving 8 metaproteomic datasets from the Atlantic, Pacific, Antarctic and Arctic Oceans for a total of 219 samples containing 106,995 proteins and 1,577,368 peptides altogether (Figure 1; oceanproteinportal.org). Several more metaproteomic datasets arein the ingestion queue. Similarly, least common analysis software METATRYP (Version2) is operational as a standalone tool and also is connected to the OPP via an API, andcontains a total of 71,574,958 peptides within the database from 145 genomes, 3metagenomes, and 956 MAGs/SAGs to date (contributing 58, 8 and 41 million peptides, respectively).


Last Modified: 12/29/2019
Modified by: Mak A Saito

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page