Title : Draft on Ocean Sciences HPCC report Type : Report NSF Org: GEO / OCE Date : May 07, 1993 File : geo9301 ****************************************************************************** This File has been updated 10/31/96 to reflect the proper address of the: National Science Foundation 4201 Wilson Boulevard Arlington, VA 22230 For more information call: (703)306-1234 ****************************************************************************** Here is a copy of the ocean sciences HPCC report. It was designed to summarize the goals and priorities of the community at large and hopefully provide guidance for future high performance computing applications. Your comments are appreciated. Sergio Signorini (Email: S.SIGNORINI/ssignori@nsf.gov) (Voice:(202)-357-9641) Report of Recommendations from the First Ocean Sciences Planning Workshop on High Performance Computing and Communications (HPCC) (DRAFT) Chairman: Dr. Andrew Bennett Oregon State University March 17, 1993 Hotel Lombardy 2019 I Street, NW, Washington D.C. Convened by Joint Oceanographic Institutions, Inc. Suite 800, 1755 Massachusetts Ave., NW Washington, DC 20036-2102 With Support from the National Science Foundation 1800 G Street, NW, Washington D.C. EXECUTIVE SUMMARY The HPCC program was conceived by the Federal Coordinating Council for Science, Engineering, and Technology (FCCSET) as an initiative to foster fundamental problems in science and engineering, with broad economic and scientific impact, whose solution could be advanced by applying high performance computing techniques and resources. The purpose of this NSF-Ocean Sciences (OCE) meeting was to define goals, priorities, and future interdisciplinary problems responsive to the HPCC initiative. It is hoped that the guidelines originating from this meeting will (1) accelerate the pace of model development; (2) expand the breadth and depth of modeling studies; (3) foster a higher degree of organization between agencies and among researchers; (4) encourage studies which integrate large, high-quality data sets with state-of-the-art models; and, (5) stimulate development and use of new computational tools. Ocean circulation, global models of carbon flux into the ocean, the effect of turbulence and organized small-scale flows on the predator-prey interactions in the upper ocean, and the understanding of magma bodies using ocean seismic data are good examples of interdisciplinary problems that capitalize on high performance computing techniques and resources. These and other applications require significant computational and communications efforts to coordinate modeling activities with data acquisition, processing and assimilation. One of the major challenges of oceanographic science lies in the problem of integrating diverse, massive, and spatially and temporally inhomogeneous data sets with coupled models on a global scale. The study of the global climate system is one of the foremost examples requiring this type of coupled system. The goal is the development of projections of Earth system change, especially of changes in system aspects interactive with human activity. The following recommendations were considered vital to the HPCC-related progress in ocean sciences research. (1) There is an urgent need for a coherent program in the management of ocean sciences data. The progress in ocean sciences research depends increasingly upon assembling very large data sets on regional and global scales. (2) There is a need to modernize and extend the existing communications network to allow dissemination of geophysical and oceanographic data in a near-real-time basis. (3) There is a need for an accelerated evolution of faster computers with larger memory to accommodate the increasing computational requirements. (4) Educational programs for training and even familiarization of HPCC is essential at all levels. (5) Interdisciplinary coordination with other branches of geosciences, all of which have major requirements for HPCC, is fundamental for a broad impact on the advancement of ocean sciences research. The urgent need for readily accessible data bases for physical, biological, chemical and geological variables is widely recognized. 1. Introduction The dynamics of the planet Earth,as manifest in its solid, liquid and gaseous domains, are difficult to observe, describe and predict. High Performance Computing and Communications will provide essential tools in data collecting, data management, data interpretation, and numerical simulation in all of the earth sciences. The inter-relations between the three domains will only be understood once HPCC has been exploited. Perhaps the inter-relation which is at the forefront of present attention is the global climate system. Ocean processes play a major role in the prediction of the effects of global warming, world CO-2 flux, the distribution of the world's biotic production and of global weather patterns (e.g. El Nino). In the ocean's coastal areas, the prediction of fish spatial distributions and the fate and effect of pollutants are examples of challenging problems involving intensive computation and data handling. Ocean circulation and other oceanic processes are unique in the small spatial scales and long time scales needed to adequately model them. This makes modeling of ocean circulation a formidable problem with greater computational needs than for example, atmospheric models. As one component of coupled climate system models, the ocean plays a significant, even controlling, role in climate variability, both on short (approximately 1 year) and long (decadal or longer) time scales. Thus, it is absolutely necessary to improve and to make efficient high resolution models of the global scale ocean circulation. Furthermore, data in support of modeling and for uncovering physical, chemical, and biological patterns on the ocean are difficult and expensive to acquire, which often leaves us with incomplete spatial and temporal coverage. Consequently, the intensive computational needs of ocean circulation models, the need to assimilate, visualize, transmit and analyze oceanic data collected by a variety of means and platforms, and the need to provide interfaces between the models and data all argue for advances in high performance computing and their incorporation into our research programs. 2. The Need for High Performance Computing in Oceanography 2.1 Ocean Modeling and Estimation The problem of understanding the role of the ocean in climate is one of the most forbidding of computational challenges: the ocean contains energetically active scales ranging from the global down to order 1 - 10 km horizontally. In the vertical, of the order of 30 degrees of freedom at each lateral position may be required. As with all turbulent fluid problems, one cannot a-priori rule out important scale interaction over the entire spectrum. The climate modeling problem is especially difficult because integration times are necessarily long, and small systematic errors which could be tolerated over short periods can come to dominate the system- computed climate. Any climate forecast involving the ocean must be properly initialized - essentially a statement that one must be able to describe (and understand) the ocean as it exists today. This issue demands that one is involved in the data "assimilation" (or as we prefer to say the "estimation") problem. We believe there is a specific need to improve predictive global ocean circulation models by assimilating ocean temperature measurements, satellite altimetry and other data into them, a task which hitherto has not been possible due to sparsity of observations and inadequate computer resources. Quantitative estimation with present day, and immediately anticipated oceanographic data streams raises major issues of computational complexity and load. One requires the combination of eddy resolving dynamical ocean modes on a global scale, with equivalent data streams. The end result is required to be the quantitative estimate of the oceanic state. By "quantitative" we mean that useful estimates of the frequency/wavenumber spectrum of the uncertainty of the state are provided - otherwise long term forecasts may be integrating nothing but error. 2.1.1 Ocean Models Ocean models specify the state of the ocean at the nodes of a three-dimensional grid, stepping forward the state variables in time in accord with the laws of classical mechanics and thermodynamics. But to be useful tools in the understanding and prediction of the state of the ocean, and its role in climate, these models must resolve the key processes that control our climate whilst parameterising those (hopefully less crucial processes) that cannot be resolved. Because of computer limitations our present general circulation models have inadequate resolution to represent many key aspects of the ocean circulation such as, for example, the width, transport and point of separation of the Gulf Stream and the ubiquitous energy containing eddy scale (typically 20 km). There is no operating oceanic model of sufficient realism to provide the basis of a climate-forecasting tool. The central oceanographic and climate challenge is to produce oceanic models of sufficient realism that one could ultimately claim real forecast skill. A recent example of a state-of-the-art ocean simulation, which has sufficient resolution to represent jets and eddies, is the Community Modeling Effort (CME) of F. Bryan and W. Holland at the National Center for Atmospheric Research (NCAR). It serves to illustrate the constraints imposed by the capacity of presently available computers. The model, developed at GFDL by Bryan and Cox, remains the most studied and widely used ocean model. The resolution of the CME model is 1/3 degrees latitude by 0.4 degrees longitude with 27 levels in the vertical; it extends from 15 degrees S to 65 degrees N. As such it represent perhaps 10% of the area of the global oceans. Yet it requires 60 hours of CRAY XMP time per year of simulation. It takes perhaps 20 years for a parcel of fluid to circulate around the gyre suggesting that at least 1200 hours is required to study the response of the gyre to changes in surface boundary conditions. Such computation times are prohibitive and do not allow the researcher to explore parameter space. Moreover, because the computational load grows more like the cube of linear resolution a 100km-square (one degree) resolution model which presently demands a gigaFLOPS, will demand a teraFLOPS as the linear resolution is increased to 10 km (one tenth of a degree). It is generally acknowledged that resolution of at least 20km is required to capture most scales of interest for long term integration. Computational techniques learned from gas dynamics have made possible the use of a thermodynamic coordinate in place of depth (the so-called " layer models"). These will complement the "level models " such as the Bryan and Cox model, but the demand on computing resources is thus immediately doubled! It is encouraging that both types have been recoded in a data-parallel language (Connection Machine Fortran, which is close to the standard Fortran-90), and have been tested on massively-parallel processors (the Connection Machines at Los Alamos National Laboratory and elsewhere). The contribution of R. Bleck and co-workers at the U. of Miami may be noted; other participants are located at Los Alamos National Laboratory (LANL) and Oregon State University. Exceptional levels of performance are already being realized (equivalent to 10 Gflops on the 1024-node CM-5 at LANL). An important feature of the shared-memory mpp systems is their memory size (8 Gbytes and more). A global model, similar to that of Bryan and Cox, has been developed by B. Semtner and co-workers at the US Naval Postgraduate School for execution on a multi-headed Cray Research Computer. This code has also been ported to the Connection Machine by LANL scientists. Fundamentally, the demand for high performance computing for global ocean modelling is insatiable. It is clear that such models would make immediate and profitable use of teraFLOPS, and even petaFLOPS, levels of sustained performance. 2.1.2 Ocean Data The problem of understanding the ocean circulation well enough to determine its role in climate change differs from most areas of fluid dynamics owing to the great difficulty of making necessary observations because the ocean is opaque to electromagnetic radiation. But a number of clever new technologies are now available that are finally giving oceanographers the global coverage that has hitherto been beyond reach. These technologies include space-borne radars (satellite altimetry for the surface pressure fields satellite scatterometry for the stress boundary conditions), neutrally buoyant floats, global scale tomographic systems, and much enhanced versions of more conventional oceanic measurements (e.g. transient tracers). A major frontier of oceanographic science lies in the problem of integrating these diverse, often inhomogeneous (in space, time and in frequency/wavenumber character) data sets on a global scale. 2.1.3 Ocean Models as an Integrating Tool Much of what we know of the ocean is embodied in the equations of motion and relevant thermodynamics. The challenge to the oceanographer is to combine the diverse data sets with statements about how they relate to each other, and to quantities of interest which are not measured. Dynamical models, constrained to be consistent within error estimates with these diverse data types, represent the best estimates we can make of the oceanic state. In the oceanographic context, these methods are known as "inverse", "data assimilation", "estimation", "control", "optimization", etc. Our ultimate goal is the estimation of the circulation of the global ocean at intervals of order a week, with sufficient accuracy and coverage to calculate the fluxes and divergences of all climatologically important properties, and so to understand the dominant mechanisms (the physics) of the circulation as it exists today. The intention is to do that as far as possible by model/data combinations. The goal is a formidable one, simply in terms of the computational loads involved - for example, the time dependent North Atlantic model of J. Marotzke and C. Wunsch at MIT has a state vector with one hundred thousand elements at each time step. Generalized inverses of regional time-dependent quasi-geostrophic models, including error statistics and involving up to one million degrees of freedom, have been calculated on workstations and have been replicated at high speed on massively-parallel processors. Models such as these also need to be run globally, and we do need error covariances for the state vectors. The computational burden imposed by the need to calculate error covariances has hitherto inhibited the use of filter/smoothers for models with very large state vectors. 2.2 Examples of Applications 2.2.1 Joint Global Ocean Flux Study (JGOFS) Global models of carbon flux into the ocean are limited by computational power. The physical models are not eddy-resolving and the chemistry and biology overlaid on the physics at this scale tend to be simplistic. Although complexity should not be increased except to support improved fits to validation data, as we argue above, HPCC power is needed for realism. Chemical realism may require a more complex relationship between gaseous, dissolved and particulate phases. Biological complexity may include more diverse interactions within food webs (see below). 2.2.2 Global Ocean Ecosystems Dynamics (GLOBEC) Predator-prey interactions in the upper ocean are affected by turbulence and organized flows. Realistic turbulence models are now in on user computers but require significant computer time. Water motion combining turbulence and realistic organized flows is not well-modeled at the present time. HPCC should permit more rapid realizations of realistic upper ocean flow fields. Predators and prey may be overlaid on these fields with each particle given an appropriate biological vector based on realistic buoyancy or swimming capabilities. Even more complicated biological models may assign behavior to the particles based on the physiological or sensory responses of the specific type of organism. The eventual goal is a realistic representation of particle trajectories and interactions in the upper ocean. 2.2.3 Biological Functional Groups The standard biological unit is the species. However, this level of complexity presently is not approachable in realistic interdisciplinary models that incorporate food web dynamics. One logical alternative is to represent the biological community with a number of functional groups each of which represents those species which play a similar role. For example, in phytoplankton limited by light, this may entail different size ranges with members representing different pigment types, each of which is coupled to a number of photoadaptation types. These phytoplankton functional groups should be forced by multi-spectral radiation and not summed photosynthetically active radiation. HPCC can provide the opportunity for more realistic description of true biological complexity in the context of realistic physical models, initially at smaller spatial domains, but eventually at the global scale suggested above. 2.2.4 A Solid Earth (Seismology) Example Large scale modeling of elastic wave propagation in media closely representative of the seafloor with roughness sampled on a meter scale and which includes internal heterogeneities is becoming increasingly common in interpreting ocean seismic data. For example, finite difference computations of multichannel seismic data collected to understand the geometry of magma bodies and the thickening of intrusives have been done on mid-ocean ridges. In these calculations, the geometry is roughly two-dimensional and 2-D finite difference simulations have been computed. In these, node spacings are 1 meter and the experiment aperture is 10km horizontally and 10 km vertically for a total of one hundred million nodes. This computation requires on the order of a GigaByte of storage and a total of 100 billion to one trillion floating point operations. With a 100 Megaflop machine, such calculations can be conducted in matter of hours. While these studies have done a remarkable job of simulating observations, a number of effects can require fully three-dimensional computations. For example, the seafloor is only two-dimensional in restricted conditions and the decay of scattered energy and the proper representation of signal phase require three dimensional computations. A similar computation to that described above requires Terabyte storage, and the time required with a 100 Megaflops machine expands to hundreds of hours. Obviously, a teraflop computer with a concomitant increase in memory could be very valuable in these studies. 3. Needs and Requirements The preceding sections have first identified those national issues in which a better understanding of the oceans is essential. Second,they have detailed massive operational and scientific activities hampered by inadequate computing and communications. Here we list some specific requirements. A. Need for a Coherent Program in the Management of Ocean Sciences Data Progress in research in the Ocean Sciences depends more and more upon assembling very large data sets on regional and even global scales. While this is axiomatic, reality presents many difficulties. These large data sets are often very heterogeneous. For example, the estimation of the transfer function between the geoid and seafloor morphology requires joint access to databases of ocean depth and satellite estimates of the geoid. In many cases, for example seafloor morphology or CTD data, the data have been collected with instruments with very different response functions, processed using different techniques, and formatted by different investigators or agencies. Few data standards exist in Oceanography. The data sets of interest are frequently very large. A modern marine multichannel seismic experiment often collects a terabyte of data in ten days time. A single satellite image consumes 1-3 megabytes of storage. Traditionally, manpower-intensive methods of storing data, for example, nine-track tapes, are no longer useful. The lack of data standards and the lack of experience in handling large data sets contributes to data heterogeneity through a loss of history. The techniques for data collection, the relevant instrument responses, known errors or potential errors, and other historical metadata are often lost to the end-user. If Ocean Sciences is to make use of the massive new data sets being collected, it is essential that efforts be made within the Division to address these challenges. B. An Expanded Internet for Data Collection The Ocean Sciences and Earth Sciences Divisions of the NSF have a unique opportunity to modernize and extend geophysical and oceanographic data collection which should be exploited. For example, the Global Ocean Observing System (GOOS) and its derivatives, seek to provide oceanographic data from remote locations including buoys in near-real-time using various forms of communication including satellites. In Earth Sciences, an effort has been made to acquire data remotely from continental and island stations using telephone connections and satellites. Most recently, both NASA and NSF programs have begun to establish internet connections at several remote sites. An integrated program to develop standards, data compression methods, and demonstration projects for a wide variety of geophysical and oceanographic data should be undertaken. Existing e-mail connections to oceanographic ships should be extended to provide teleconferencing and the remaining TCP/1P protocols for data and information transfer. An extended National Research and Education Network (NREN), which could provide anyone with access to Internet immediate access to global data of all sorts, would be a valuable enhancement. C. Faster Computers There is a need for faster computers with larger memory, without a foreseeable limit. Even if a threshold or plateau can be attained for physical modeling, there will be need to run many experiments, and then to combine physics with biology, chemistry and geology. D. Data Base Management and Networking More computation and more observations raise the dependence on networked data base browsing, visualization (input/output in general), database management and data transfer. This "communications" aspect of HPCC is at least as essential to ocean sciences as is "computation" and will require coordinated efforts to implement. Other essential tools include relatively portable, yet familiar programming languages, development of algorithms for efficient data management such as rapid data compression algorithms, toolkits for providing easy development of graphical data interfaces, handshaking and foreign function recognition protocols for data display, and parallel debugging tools. E. Educational Programs Training and even familiarization of HPCC is essential at all levels. Research assistants need a mix of physical, mathematical and geographical "general" knowledge, plus basic modern programming skills. The ability to write compilers, for example, is not needed. Postdoctoral fellows and junior investigators need to be encouraged to try the new HPCC technology before old habits become entrenched. Senior investigators need to become comfortable using HPCC themselves, even if only in elementary or tutorial applications, so that they can have confidence in the work of their assistants and junior colleagues and direct them toward appropriate applications. Also, the senior scientists must be able to communicate with their assistants and colleagues on HPCC. F. Interdisciplinary Coordination Much is to be gained by coordination with other branches of geosciences, all of which have major requirements for HPCC, both in "computing" and "communications". There is a continuum-mechanical foundation common to oceanic, atmospheric and earth mantle/core dynamics; in particular coupled ocean-atmosphere models for marine meteorology and climate research are an essential objective. The desirability of merged and readily accessible databases for physical, biological, chemical and geological variables is widely recognized. G. Near-Real-Time Data Processing and Dissemination A greatly enhanced NREN can be used to obtain data from remote geophysical and meteorological stations in near-real-time to provide a synoptic view of the Earth. The most difficult and expensive part of the Global Ocean Observing System (GOOS), for example, is providing communications channels for delivering data to scientist. The Internet already provides the needed protocols and the multiple path connections between sites ensure a robust information delivery scheme which would be difficult and even undesirable to replicate for special purposes. This use of the Internet for data delivery and remote instrument verification and calibration is perhaps unique to the Ocean , Earth and Atmospheric Sciences and should be exploited by the HPCC. H. Local versus Central Computing Facilities We specifically avoid making recommendations for, say, a computer of a certain size, to be located in a certain place by a certain time. Indeed, the best mix of local and central computing facilities is not yet clear. We do urge all marine scientists to appraise themselves of the potential of HPCC to enhance their work. Many national laboratories have massively-parallel processors, while public-domain software for distributed computing is readily available. MEMBERS OF THE HPCC OCEAN SCIENCES PLANNING WORKSHOP Dr. Andrew Bennett (Panel Chaiman) College of Oceanography Oregon State University Oceanography Adm Bldg 104 Corvallis, Oregon 97331-5503 Voice: (503)-737-2849 FAX: (503)-737-2064 Internet: bennett@oce.orst.edu Omnet: A.BENNETT Dr. William Holland National Center for Atmospheric Research Climate and Global Dynamics Division Oceanography Section P.O. Box 300, Boulder, Colorado 80307 Voice: (303)-497-1353 FAX: (303)-497-1137 Internet: holland@ncar.ucar.edu Omnet: W.HOLLAND Dr. Daniel Kamykowski Department of Marine, Earth and Atmospheric Sciences North Carolina State University Box 8208 Raleigh, NC 27695-8208 Voice: (919)-515-7894 FAX: (919)-515-7802 Omnet: D.KAMYKOWSKI Dr. John Marshall Massachusetts Institute of Technology Department of Earth Sciences Cambridge, Massachusetts 02139 Voice: (617)-253-9615 FAX: (617)-253-6208 Internet: marshall@gulf.mit.edu Omnet: J.MARSHALL Dr. John Orcutt Scripps Institute of Oceanography, A-015 University Of California, San Diego La Jolla, CA 92093 Voice: (619)-534-2887 FAX: (619)-534-2902 Internet: john_orcutt@igppqm.ucsd.edu Dr. Gordon Swartzman Center for Quantitative Science, HR-20 University of Washington Seattle, WA 98195 Voice: (206)-543-0061 FAX: (206)-543-6785 Internet: gordie@apl.washington.edu