Skip To Content Skip To Left Navigation
NSF Logo Search GraphicGuide To Programs GraphicImage Library GraphicSite Map GraphicHelp GraphicPrivacy Policy Graphic
OLPA Header Graphic

Dr. Colwell's Remarks


"Computing: Getting us on the Path to Wisdom"

Dr. Rita R. Colwell
National Science Foundation
SC2002: From Terabytes to Insights
Baltimore, Maryland

November 19, 2002

See also slide presentation.

If you're interested in reproducing any of the slides, please contact
The Office of Legislative and Public Affairs: (703) 292-8070.

Good morning to everyone, and thank you for the warm introduction. I'm delighted to be here at SC2002. This is a high-tech, high-energy crowd, and the atmosphere is full of excitement!

[Title Slide]
(Use "back" to return to the text.)

I've titled my remarks, "Computing: Getting us on the Path to Wisdom." You may wonder what I mean by wisdom, a word that carries such gravitas. We may not be able to define it. We can look for signs of it, and we often recognize it when we see it.

[Emerson quote]
(Use "back" to return to the text.)

One of my favorite yardsticks of wisdom comes from Ralph Waldo Emerson over a century ago. "The invariable mark of wisdom" he said, "is to see the miraculous in the common."

As scientists, engineers, and educators, we are privileged to have our lives infused with the miraculous. Discovery, learning and innovation are paths we travel daily.

[IBM 650]
(Use "back" to return to the text.)

Many of you will recognize the next image. It's the IBM 650. I used it for my own Ph.D. research to classify marine bacteria.

I wrote the program to handle what we thought was a large amount of data gathered on several hundred bacterial cultures.

This was the first American use of the computer to classify bacteria from the environment. In fact, the coding scheme we developed for bacteriological data remains in use today, and is widely employed in many hospitals across the country.

As for the IBM 650, it was installed in the attic of the chemistry building at the University of Washington, and we graduate students got to use it between the hours of two and four a.m.

Today, an IBM 650 is literally a museum piece. One is on display at the Smithsonian! And as this conference attests, a new age of supercomputing has dawned.

Many of us have seen our work transformed in unimagined ways by the power and breadth of the information and communications revolution that we are all a part of.

[Cholera collage]
(Use "back" to return to the text.)

My own research on the environmental factors that converge to cause cholera has traveled many miles from the early days of the IBM 650 - to the sequencing of the organism that causes cholera, to handling vast amounts of climate data gathered by satellites, to easy communication with colleagues around the world, particularly those working in countries where cholera is still a deadly scourge.

The changes born of the information age have helped to change the way infectious diseases are understood, and opened new prospects for ameliorating their deadly consequences.

Today, I intend to take us a step forward to the frontiers of knowledge. I'll describe the National Science Foundation's vision of cyberinfrastructure for the future, and then provide some examples to illustrate why the time is ripe for action.

The first wave of the information and communications technology revolution has reshaped the once familiar landscape of the economy and has forced us to clear new paths in research, education, and business. It has swept across every field of research, and changed forever our scientific and educational horizons. New frontiers of knowledge, unimagined only a few years ago, are now open to us.

I believe we stand on the threshold of a new age of scientific exploration, one that will give us a deeper understanding of our planet and allow us to improve the quality of people's lives worldwide.

A great challenge today is to sustain the momentum of discovery and realize the progress that our new tools promise.

Science and technology have always been a powerful force for human progress. In the 21st century, more than ever before in history, we have the opportunity to advance global prosperity as we expand the frontiers of knowledge and make possible ever greater achievements. The conference title, "From Terabytes to Insights," points to this journey into new territory.

[National Science Foundation word slide: Enabling the nation's future through discovery, learning and innovation]
(Use "back" to return to the text.)

The National Science Foundation has partnered with the science and engineering community in our quest to advance discovery. We supported campus computing centers in the 1960s, and computational science in the 1970s.

The first supercomputer centers and networks linking researchers came on the scene in the 1980s. These were followed by the birth of Mosaic and the Partnerships for Advanced Computational Infrastructure in the 1990s, and the Terascale and Grid initiatives of this decade.

NSF has worked with the community to foster collaboration, to support frontier research in computing and network science, and to educate the next generation of scientists and engineers to carry discovery forward.

For decades, NSF has been steadily crystallizing the idea of a center that brings together diverse skills, tools, and perspectives to focus laser-like on scientific and technological problems.

From this come the original science and technology centers, the engineering research centers, and the supercomputing centers, which you know well. Centers in new and promising areas of research are burgeoning. It is a form and a formula that has served us well.

[Map of US with Teragrid superimposed]
(Use "back" to return to the text.)

Now we look toward a grander scale: the TeraGrid, a distributed facility that will let computational resources be shared among widely separated groups.

This will be the most advanced computing facility available to scientists for all types of research in the United States - exceptional not just in computing power but also as an integrated facility. It will offer access to researchers and students across the country, merged data resources, and visualization capability.

It is a step toward the vision of a cyberinfrastructure that will give a broad range of researchers access to high-performance computing, high-bandwidth networks, very large data stores, and sophisticated tools for knowledge discovery.

[Schematic representation of the elements of cyber infrastructure]
(Use "back" to return to the text.)

The demand for sophisticated cyberinfrastructure is exploding in every field of science and engineering. Teams of researchers working within and across disciplines are coming together to lay the foundations for a cyberinfrastructure revolution.

As many of you know, NSF is planning to launch a cyberinfrastructure effort to address these growing research and education needs.

We have chartered an Advisory Committee on Cyberinfrastructure - the Atkins Committee - to consult with the science and engineering community and to assess common needs. Their final report will recommend a course for the coming years. It is expected shortly.

We expect that NSF's current PACI partnerships and terascale facilities will play a pivotal and even expanded role in this initiative.

As you can see from the slide, cyberinfrastructure will move us from "Terabytes to Insights". We envision fundamental research in information technology and applications in all areas of research. We will need to expand our network capabilities and our large data repositories, and develop new computational, analytical and visualization tools.

Central to the vision of cyberinfrastructure are People. A great challenge in turning our vision into reality is assembling the talent. We will need renewed efforts in education across the board.

It's one thing to tell you that we need this expanded cyberinfrastructure, and quite another to show you. I'll focus the remainder of my remarks on the frontiers of discovery. The most eloquent demonstration of the need lies in the great possibilities for the creation and application of new knowledge found there.

[Image of old star chart]
(Use "back" to return to the text.)

More than any other fields, astronomy and physics have already benefited from the supercomputing revolution. We travel into ever more distant reaches of the universe in search of its origins and nature.

Observation of astronomical phenomena reaches back in time to the earliest attempts to read in the stars a message about our own relationship to the cosmos. This star chart depicting Perseus is based on Tycho Brahe's observations. All told, he cataloged 700 objects.

Today, vast, ever-expanding datasets are collected with a variety of instruments.

[Animation of Sloan Digital Sky Survey data] animation not available

This animation represents data on the distance of 130,000 galaxies collected by the Sloan Digital Sky Survey. Each "wing" is one observational swath at a point in time. Two of the "wings" are from a Northern quadrant, the third from a Southern. The edges of the wings extend approximately five billion light-years. The Sloan Survey will eventually map 100s of millions of astronomical objects.

The Sloan archive can be used by anyone on the Internet. Several hundred teachers here in Baltimore have been trained to use Sloan data in their classrooms. Tamas Szalay, a 15-year-old high school student, wrote the computer animation we see here!

LIGO, short for Laser Interferometry Gravity-Wave Observatory, is one of the newest arrows in our astronomical quiver. It is designed to search for gravity waves produced by colliding black holes or collapsing supernovae. LIGO will join with other gravity-wave observatories around the world to become more than the sum of its parts.

[Animation of Colliding Black Holes; 27 seconds] animation not available

We will now see a simulation of the collision of two black holes. The spectacular rainbows of color are gravitational waves - predicted by theory, but not yet observed experimentally.

The stunning visualization, produced by Ed Seidel, NCSA, required roughly a terabyte of data generated by simulations of Einstein's equations carried out by the National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory and NCSA. Let's view the clip.

Simulations like these will be crucial both in detecting and interpreting data from LIGO.

[Schematic of GriPhoN: Grid Physics Network]
(Use "back" to return to the text.)

The Grid Physics Network, or GriPhoN, joins LIGO and the Sloan Digital Sky Survey with the Large Hadron Collider at CERN, the European accelerator laboratory. They form a computational and communications grid that ties together resources from the United States and Europe.

We expect that advances in developing petascale virtual data grids made in this project can be extended to other components of our nascent cyberinfrastructure.

[National Virtual Observatory animation: 1 min 36 sec.] animation not available

Finally, we will see here how a networked system might bring data from all wavelengths and from ground and space-based telescopes to an international community of astronomers through a National Virtual Observatory.

Here we are watching the network at work - the integration of data from different wavelengths and from different telescopes, both in space and on the ground.

The virtual observatory will ultimately change the way science is done. For example, it will bring together the "separate wavelength cultures" and it will bring science to desktops around the globe.

These examples demonstrate that the demand for resources to archive, manipulate, and extract knowledge from databases is expanding at an accelerating rate.

[Shows projected increases in database size]
(Use "back" to return to the text.)

This projection from PACI gives us some idea of the increases in magnitude expected over the next four years.

It shows us something else as well. Databases in astronomy and physics are currently orders of magnitude larger than those in neuroscience, earthquake engineering, or ecology.

But not for long! Planned instruments and observational platforms will boost these figures sky-high in the years ahead.

Now let's move from the far reaches of the cosmos to our own dynamic planet and to life on earth

[earthquake disaster photos]
(Use "back" to return to the text.)

The Northridge earthquake of 1994 reminded us of our vulnerability, and spurred the pace of disaster research.

[Simulation of aftershock of Northridge earthquake superimposed on map of San Fernando Valley; 19 seconds] animation not available

Here we see the first 19 seconds of the aftershock from the Northridge earthquake in 1994, with the San Fernando Valley in the background. The simulation was produced by Greg Foss of the Pittsburgh Supercomputing Center Visualization Group.

A variety of tools will help us understand the forces causing earthquakes and their destructive consequences. Japan's Earth Simulator, for example, will be a valuable international resource in meeting this common goal.

[NEES graphic]
(Use "back" to return to the text.)

The Network for Earthquake Engineering Simulation -NEES - will also help.

It is a 21st century model of collaboration - literally, a laboratory without walls or clocks. Researchers from across the United States will be able to operate equipment and observe experiments from anywhere on the net. They will study how building design, advanced materials and other measures can minimize earthquake damage and loss of life.

Just last week, researchers conducted the first test of the web-interface technology. A shake table vibrated a model bridge fitted with about 100 sensors that streamed video and data to watching engineers, who then analyzed the bridge's performance.

[EarthScope slide showing network of sensors]
(Use "back" to return to the text.)

Another NSF-funded project, EarthScope, will generate basic scientific understanding of the structure and evolution of the North American continent and the physical processes controlling earthquakes and volcanic eruptions.

This slide shows a uniform grid of 2000 sites that researchers will sample, over the next decade, using portable seismic stations. An observatory four kilometers deep will directly monitor processes in the active San Andreas Fault zone, while a distributed observatory will gather data on plate movements from Alaska to Mexico.

Combined with new satellite and GPS systems, the entire EarthScope array will provide a dynamic picture of the forces that continue to shape earth.

Understanding gained from EarthScope will feed directly into NEES projects.

Cyberinfrastructure will take the earth sciences to an entirely new plane of discovery. Both NEES and EarthScope are vital components.

[GEON schematic: map showing links to diverse geological sites]
(Use "back" to return to the text.)

So is GEON, a coalition of information technology and earth sciences researchers who are working together to create a modern cyberinfrastructure for the earth sciences. The goal is to build an integrated database - spanning the atmosphere, the oceans, and the land - to advance our understanding of the complex dynamics of earth systems.

GEON reminds us that People are at the heart of cyberinfrastructure.

From earth science, I turn to the life sciences. Our new information and communications tools, combined with advances in molecular biology, fueled the second great scientific revolution of the last century: genomics.

[genome comparison slide]
(Use "back" to return to the text.)

From the tiny genome of the first bacterium sequenced, Haemophilus influenzae, with 1.8 million base pairs, to the 3.12 billion that comprise the human genome was a leap of enormous magnitude. Researchers from Celera Genomics, who helped sequence the human genome, estimate that assembly of the 3.12 billion base pairs of DNA required 500 million trillion sequence comparisons.

Completing the human genome project might have taken years to decades to accomplish without the terascale power of our newest computers and a battery of sophisticated computation tools.

Now we have completed the sequencing of scores of organisms, from many of the microorganisms that cause human disease to the tiny little Arabidopsis thaliana that serves as a model for plant research.

Sequencing is underway on the parasite that causes malaria, on many of the world's major food crops, and on the mouse. The age of biotechnology lies before us.

[ molecule]
(Use "back" to return to the text.)

A challenge now is to describe gene function, and to unravel the structure and function of proteins.

It can take from 20 milliseconds to several seconds for a nascent protein to fold into its functional conformation. Until recently, it took 40 months of computer time to simulate that folding. With new terascale computer systems - operating at one trillion operations per second - we have reduced that time to one day. That's 1000 times faster.

Even at today's speeds, understanding the function of each protein in the vast array that occur will require many of the best minds in the world and advanced cyberinfrastructure to empower them.

Here is an example of how simulation and visualization are able to reveal what experiment cannot. This is cutting-edge work by Klaus Schulten and colleagues of the Beckman Institute at the University of Illinois at Urbana-Champaign.

[water molecules moving through aquaporin channel, transport across membrane]
(Use "back" to return to the text.)

Here we see water molecules passing single-file through a channel of the membrane protein aquaporin.

This simulation, which includes over 100,000 molecules, shows that water molecules do a mid-channel flip, which we can see here.

This mechanism blocks damaging hydrogen ions (not shown here) from entering the cell, while allowing water to pass through at up to a billion molecules per second. When impaired, aquaporins play a role in cataracts and diabetes.

[Tree of life]
(Use "back" to return to the text.)

The combination of computing, communications and genomics has also transformed our understanding of the diversity of life on earth and its evolution. Cyberinfrastructure is needed here to plot the intricate relationships among organisms.

The simple fact is that we don't even know "what's out there." The total number of species may number between 10 and 100 million. Only about 1.7 million of these are known, and only about 50,000 have been described in any detail.

With our new tools, we can envision tracing the phylogenetic relationships among all organisms for the first time. The tree of life is the baseline against which we will measure how organisms - including humans - interact and respond to change.

In this context, NEON--the planned National Ecological Observation Network--will be invaluable. Here is a video describing a NEON site.

[Video clip: NEON; with narration on clip] video not available

The entire NEON system would track environmental change from the microbiological to global scales.

Today, we simply do not have the capability to answer ecological questions on a regional to continental scale, whether involving invasive species that threaten agriculture, the spread of disease or agents of bioterrorism.

[Ocean observatory, poster images],
(Use "back" to return to the text.)

Eventually, such observatories must be extended to the oceans as well, perhaps with links to the ocean observatories now in the planning stages. This slide from John Delaney of the University of Washington shows one possible configuration from the Neptune project now underway.

As data from these observatories begins flowing in, new models and simulations can be constructed to describe the complex dynamics that link molecules to organisms to ecosystems, and relate these to environmental databases.

A better understanding of these relationships is critical for addressing issues of environmental health and sustainability. Climate change is a case in point.

[Video simulation of global circulation of water vapor] video not available

This animation shows the circulation of water vapor around the earth. It is a product of the Community Climate Model at NCAR, the National Center for Atmospheric Research at the University of Colorado.

Climate models are extremely complex, and becoming more so as we integrate more information from the atmosphere, the oceans, and the land. Incorporating rich models from the life sciences will be a major step forward in understanding the consequences of climate change.

[New biocomplexity spiral]
(Use "back" to return to the text.)

There has been a sea change in the way we investigate life at all levels. I use the term "biocomplexity" to describe the dynamic web of relationships that arise when living things, from molecules to genes to organisms to ecosystems, interact with their environment.

We will need the power of supercomputing, and the integration and insight that a comprehensive cyberinfrastructure provides to untangle these complex interactions. A robust cyberinfrastructure across the full spectrum of life sciences can speed us down the path of discovery.

To give you just a taste of the power of biocomplexity studies, I'll turn now to several brief examples.

[Bacteriorhodopsin molecule]
(Use "back" to return to the text.)

Bacteriorhodopsin is protein that acts as a light-driven proton pump in the cell membrane. Researchers have been investigating its structure and dynamics for over thirty years. Only recently, Klaus Schulten and his group, whose work I mentioned before, resolved some of these complex details through simulation.

Bacteriorhodopsin was thought to occur only in a small number of species, namely the halobacteria, which thrive in environments ten times saltier than seawater. Despite the name, they are actually members of the Archaea, the third branch of life and among the oldest forms of life on earth.

[ocean scene]
(Use "back" to return to the text.)

Obed Beja and Edward DeLong of the Monterey Bay Aquarium Research Institute recently discovered that bacteria containing a close variant of this energy-generating, light-absorbing pigment are widespread in the world's oceans.

[Graph of variants response to light]
(Use "back" to return to the text.)

Genetic variants of the bacteria absorb light of different wavelengths, matching the quality of light available in different ocean habitats. This research points to a significant new source of energy for microorganisms in the ocean.

We begin to map biocomplexity by tracing the links from the function of a protein to the distribution and variation of bacterial populations to biogeochemical cycles.

[Lenski: digital organisms and evolution]
(Use "back" to return to the text.)

On quite another scale, mathematics, biology and computer science intersect to bring surprising insights into the process of evolution.

Richard Lenski at Michigan State has joined forces with a computer scientist and a physicist to study how biological complexity evolves, using two kinds of organisms--bacterial and digital.

Lenski's E. coli cultures are the oldest of such laboratory experiments, spanning more than 20,000 generations. Here the two foreground graphs actually show the family tree of digital organisms--artificial life--evolving over time.

On the left, the digital organisms all compete for the same resource, so they do not diversify and the family tree does not branch out. On the right, the digital organisms compete for a number of different resources, and diversify.

In the background are round spots--actually laboratory populations of the bacterium E. coli, which also diversified over time when fed different resources. In vivo derives insight from in silico.

My final example touches the field of cognitive neurobiology.

[C. elegans]
(Use "back" to return to the text.)

This slide shows the homely little worm, Caenorhabditis elegans. This year, the Nobel Prize was awarded to three scientists for pioneering work that established this unassuming creature as a model for neuroscience.

Today, our imaging techniques, such as fMRI and CAT, are producing a wealth of data on the human brain. Supercomputing projects, most notably BIRN, the Biomedical Information Research Network at UC San Diego, are breaking new paths and opening the frontiers for the complex study of cognitive and behavioral neurobiology.

[Images of zebra finch, hummingbirds, budgerigars, bats, whales, human]
(Use "back" to return to the text.)

Erich Jarvis, the 2002 NSF Waterman Award winner, is investigating the neurobiology of vocal communication in songbirds to determine how vocal learning and associated brain structures evolved.

Vocal learning is the ability to imitate sounds. It is present in only six groups of animals: 3 groups of birds - parrots, hummingbirds, and songbirds - and 3 groups of mammals - bats, cetaceans, and humans.

[Brain and phylogenetic tree]
(Use "back" to return to the text.)

Evidence suggests that vocal learning evolved independently in all 6 groups over 65-70 million years. On the left of this slide, you can trace the evolutionary distance among the three bird groups.

Perception and production of song in these groups are accompanied by anatomically distinct patterns of gene expression. These are shown on the right as red and yellow areas of the brain.

The red areas show very similar locations, while the yellow areas are widely distributed. Jarvis hopes to develop a model for how the brain generates, perceives, and learns behavior by unraveling this puzzle.

His work draws on a broad spectrum of fields that integrate behavioral, anatomical, electrophysiological, molecular biological and bioinformatics techniques.

His work could advance our knowledge of how humans learn language, of brain dysfunction, and of the evolution of intelligence.

I've strayed rather far afield, but there is a point to be made. These surprising connections - from molecular structure to biogeochemical cycles, from in vivo to in silico, and from behavior to cognition to gene expression to neuroanatomy - give us a taste of the extraordinary complexity - and potential for insight - that a biocomplexity perspective provides.

A robust, flexible, and comprehensive cyberinfrastructure will give us the foundation we need to make rapid progress in understanding even our human complexities.

I'll conclude by returning to my starting point, and treating you to a final taste of the "miraculous." Many of you have enjoyed this clip at the American Museum of Natural History's Hayden Planetarium. It is taken from the space show, "Search for Life: Are We Alone?"

The Museum, NSCA and PACI all collaborated, and David Nadeau [Nuh- doe'] of the San Diego Supercomputing Center led the visualization effort.

You will see the birth and evolution of an emissions nebula, a phenomenon that occurs just after the birth of a star. The clouds of color represent high temperature gases energized by ultraviolet light from the star.

Let's view the video.

[Video of birth of emissions nebula] video not available

I first saw this breathtaking simulation in May, when I was in New York to speak at a Tree of Life Conference held at the Museum. I was struck then by the possibilities for insight that integration across all frontiers in science and engineering holds.

That may be a goal for the future, but we will not achieve it unless we begin now to assemble the cyberinfrastructure that will make it possible.

[end slide with title]
(Use "back" to return to the text.)

Ultimately, gaining insights from terabytes will also speed the application of new and miraculous knowledge to domestic as well as global problems.

We need to be more alert and astute to anticipate some of the new problems. Just think how better prepared we could have been for the looming global fresh water crisis or the emergence of new infectious diseases that seem to have taken us by surprise.

Data, computing speed, and networks are steps on the path to wisdom - they do not constitute wisdom.

We understand now that changes in global climate cannot be understood without taking into account the effect that humans have on the environment - the way our individual and institutional actions interact with the atmosphere, the oceans and the land.

We now know that providing a secure homeland will increasingly depend on understanding other cultures - their ideas and attitudes - as well as advancing cyber security, and developing antidotes to combat biological and chemical threats.

The greatest question of our times may be how we can avoid the pitfalls, and still grasp the opportunities that science and technology hold.

The world of vast distances and differences is shrinking, and soon every part of the globe will seem as close as our own back yard.

We need to keep our eyes on that future and plan now for the time when we are all next-door neighbors. That will define science and engineering for a 21st century society.

Cyberinfrastructure will help take us there and beyond.



National Science Foundation
Office of Legislative and Public Affairs
4201 Wilson Boulevard
Arlington, Virginia 22230, USA
Tel: 703-292-8070
FIRS: 800-877-8339 | TDD: 703-292-5090

NSF Logo Graphic