Skip To Content Skip To Left Navigation
NSF Logo Search GraphicGuide To Programs GraphicImage Library GraphicSite Map GraphicHelp GraphicPrivacy Policy Graphic
OLPA Header Graphic

Dr. Colwell's Remarks


Dr. Rita R. Colwell
National Science Foundation
Knowledge Discovery and Dissemination Meeting
Herndon, Virginia

September 4, 2002

See also slide presentation.

If you're interested in reproducing any of the slides, please contact
The Office of Legislative and Public Affairs: (703) 292-8070.

Thank you, Gary, and good morning, everyone. It's a pleasure to be here to open the second day of this NSF Kickoff Workshop for our program on Knowledge Discovery and Dissemination, better known as "KDD." I understand that there was much stimulating interaction yesterday and I know that today promises even more. I'm glad to be part of it.

This meeting is an extremely timely opportunity for two communities to interact--the information technology researchers and the intelligence analysts--an opportunity to connect NSF grantees doing cutting-edge research with those in other agencies who need the eventual applications of your research.

As context for this interaction, I would like to speak about why the National Science Foundation is supporting KDD.

This research program seeks to harness information technology to improve our ability to synthesize and use information culled from many different sources. This fundamental research, an unclassified program like everything NSF supports, holds great potential to contribute significantly to our national security, while very much in keeping with our overall research goals.

Almost a year ago, September 11 thrust a new reality upon us. As we look back with the inevitable hindsight to who knew what and when, we realize that we vitally need better tools to assemble a comprehensive picture of threats that may face us.

One obstacle to synthesis is information overload. A second challenge, as described by intelligence scholar Gregory Trevorton, is structural divides--"distinctions between intelligence and law enforcement, between foreign and domestic, and between public and private." 1

KDD offers some promise in surmounting both challenges posed by information--overload and synthesis.

In the past year, the National Science Foundation--responsible for supporting science and engineering research across the entire range of disciplines--has funded a number of very specific efforts related to the attacks of September 11 and their aftermath. The NSF Act of 1950, in fact, expressly authorizes us to support science and engineering related to national security.

Many of these efforts build upon our existing investments in fundamental research. Right after the 9/11 attacks, for example, we supported the use of small experimental robots to search the WTC site for remains. We also sponsored engineering studies into what caused the buildings to collapse. Other grantees are studying the geographic dimensions of terrorism and the short and longer-term societal responses to 9/11. Still others have sequenced the anthrax genome and developed sensors for detecting bioterrorism.

While not supporting classified research, NSF contributes to homeland security in a number of ways. We may bring together key researchers from any number of fields with other federal agencies--such as at this meeting. NSF may also contribute by sponsoring a workshop on a critical topic, such as the one we held on chemical and biological sensors.

NSF's role, from information technology to engineering, and from social science to bioscience, is to support the fundamental research needed by other agencies and industry for applications.

A few numbers will help to make my case about what NSF can offer. We currently account for about half of the Federal non-medical support for fundamental research at U.S. colleges and universities.

Each year 50,000 reviewers--the brightest minds in science and engineering--competitively review the 32,000 funding requests we receive. We are able to fund only around a third of these, or about 10,000 new awards annually. In any case, we have access to communities with expertise on a wide spectrum of areas related to homeland security.

NSF is also about people--building the future science and technology workforce. Since 1952, we've supported 36,000 graduate research fellows across the disciplines. More broadly, we calculate that we directly support nearly 200,000 people each year--teachers, students, researchers, post-doctorates and trainees.

NSF highlights support for research at the intersections of disciplines. The ideas and technologies of life science, physical science and information science are merging. Increasingly, it is at these frontiers, where disciplines converge, that new knowledge is being generated to meet the complex challenges we face as a society.

In the past few years we have made it a deliberate part of our strategy to demarcate areas of converging discovery for special investment. These areas are information technology, nanotechnology, biocomplexity, mathematics, and the study of how we learn.

We lead the Federal investment in information technology, a joint effort among Federal agencies. We also lead the National Nanotechnology Initiative, a coalition of organizations from government, academe and the private sector. Because we encompass all the disciplines of science and engineering, we naturally seek the synergy of partnering with other agencies.

Not constrained by a narrow mission, we can be flexible about responding to emerging needs. In fact, our founding act of 1950 directs us to support unclassified research and education through support from other federal department and agencies. KDD is just a current example of that.

We consider it critical to nurture new research communities focused on emerging challenges. Many of these challenges have dual payoffs.

Take the Incorporated Research Institutions for Seismology--the worldwide network for monitoring natural seismic activity and earthquakes that has been equally valuable for monitoring nuclear tests, surreptitious or not.

Another example: the proposed National Ecological Observation Network, which will provide real-time monitoring of complex ecological systems. NEON promises much more detailed understanding of how the environment works. It could also be used to track the health of our environment, for monitoring invasive species or diseases such as West Nile virus.

One more example: NSF is supporting the development of a new research area, computational epidemiology.

The sheer scale and complexity of epidemiological problems today--and the large data sets they engender--call for powerful computational tools and mathematical analysis. We're supporting groups that will study specific topics, such as data mining and epidemiology. Tutorials will also bring epidemiologists and biologists together with computer and mathematical scientists to learn about each other's fields.

There is another dimension to how we support research. We believe that high-risk research with the payoff of discovery needs the time and resources to flourish. I have often spoken about the need to increase both the size and duration of NSF awards. Our average grant currently runs three years. However, a recent survey of our principal investigators showed that five-year grants would be more effective. Also, the survey suggested that larger grants would encourage more innovative ideas and greater collaboration with other researchers. You can find the detailed report of the survey on our website, but I cite it to assure you that those of us at NSF consider its findings very important.

With this context on NSF's mission and style of working, I'd like to focus on the KDD program. We all know the term "data mining"--combing through a huge data set for hidden insights, sort of like searching for a needle in a haystack.

KDD aims beyond this, not only to discover vital bits of information from many types of sources, but also at rudimentary synthesis and sharing the information with those who need it. We expect the results to be valuable in both the intelligence and law enforcement arenas. KDD augments research by NSF grantees that is already underway.

Researchers are given the resources to accelerate their work. It can enable them to take on new students, collaborate with other faculty, buy new tools, or even collaborate with each other, as we hope could emerge from this meeting.

I'll turn now to some specific KDD projects, with graphics provided by several of you here today, and I thank each of you.

I plan to note just a few research highlights because the real experts are here and they'll be presenting their work in detail later on, so please save questions on the projects for them. Another caveat is that this is all very much work-in-progress.

[Slide up: Speaker differences: Feedback]
(Use "back" to return to the text.)

The first project is "talk printing"--aimed at enabling machines to automatically recognize a person by the way he or she talks. This is Elizabeth Shriberg's work, and she is from SRI International. Talk printing goes beyond current approaches that tend not to differentiate between speakers with similar vocal tracts. Instead, the new method looks for identifying clues in word sequence, intonation, pausing, and interruption behavior, to name just a few.

Let's listen to a few examples of how different speakers use unique "feedback" in conversation--little phrases like "uh-huh" and "right" that we use to show we're listening. We'll hear four examples, as shown on the slide:

[NOTE: actual speech samples are not available]

The first is low in pitch and flat.

The next is a different speaker with a similar style, but he uses another feedback word--"right."

Here's a higher pitch range.

The last speaker also rises in pitch but draws out the phrase longer.

How a speaker closes a conversation is also very individual.

[change to next slide: speaker differences/conversation closings]
(Use "back" to return to the text.)

They can all use the same phrase--"It was good talking to you"--but sound very different. Let's listen to a slow example.

Now a fast one.

Here's one that goes up and down in intonation.

Now a speaker with high energy and pitch range.

Talk printing will automatically let a computer distinguish between these different habitual patterns of speech, and I'm sure Elizabeth will explain the nuts and bolts of the method in her presentation today.

I'll just add that the technique offers interesting features for intelligence gathering, law enforcement, and speech technology. For example, it can distinguish between a casual chat and conversation planning an event. It can also suggest who is dominating a conversation, or when someone is departing from their usual speaking style--like disguising their voice.

[CMU Informedia: screen capture/display of results on sightings of Bin Laden couriers]
(Use "back" to return to the text.)

Here's another KDD example, the Informedia Digital Video Library, provided by Howard Wactler of Carnegie Mellon University. In this case, the idea is to render the vast amounts of available, open-source multimedia data streams useful for intelligence analysts.

This research expands the ability to discover and track relationships from video sources, using extracted textual and visual information. Here, for example, the analyst queries a large video database--broadcast radio and television and surveillance video--for identifications of Bin Laden couriers. The selected video samples can actually be played. At the same time, a map is shown that plots courier sightings at corresponding times and places. Eventually the display will provide material in multiple languages.

[2nd CMU Informedia: relationships among Al-Queda terrorists]
(Use "back" to return to the text.)

Here's a second, simulated example showing the capabilities of Informedia. This display illustrates relationships among five Al-Queda members, again culled from visual media reports. We see dots plotted between the individuals.

(Use "back" to return to the text.)

The more dots or "hits" on a line between two people--such as between Atta and Zawahiri--the more frequently they occur together in a news story.

Different colors show time--when the news stories appeared. The analyst can specify the period sampled--making news stories of a specific time-period appear in a certain color. For example, if you look closely here, blue dots denote older reports, and pinker dots show more recent stories.

[Roukos: the difficulty of automatically detecting the first time a topic is reported: graph of blue dots]
(Use "back" to return to the text.)

Another KDD example comes from Salim Roukos [Pr: Saleem Roo-cohs] of IBM, who is developing technology to automate the extraction of text with meaning--not just the extraction of individual words. The problem is how to automatically detect the first reporting of a topic or event in the media. Present technology using a word search alone does not work.

This graph depicts all the stories that ran on a newswire service over a given time period, perhaps on a given day. Each blue dot is a story.

[same graph with diagonal line]
(Use "back" to return to the text.)

If we ask for all the news stories that are the first reports on new topics, the current method draws this diagonal line through the data and tells us to look at the stories to the right of the line.

[same graph with diagonal line and red dots]
(Use "back" to return to the text.)

Now we see the first reports highlighted in red. The current method, based on searching for words, did not work--it failed to separate old topics, blue dots, from new topics, the red dots.

[second Roukos graphic: English/Arabic text correlations]
(Use "back" to return to the text.)

Computers need to develop a more semantic way to represent text--to sense not just the words but what they mean in their context. This graphic suggests how this could be done. We see that news stories in English and Arabic contain several similar phrases, color-coded for similarity, strengthening the assumption that the same event in being reported in both. Eventually, computer inference of statistical patterns will automate the extraction of knowledge in several languages.

One more example: Here is work by Hsinchun Chen, of the University of Arizona, that is rooted in earlier NSF programs called digital libraries and digital government. It shows how KDD builds on previous NSF support. In this case, police data, at left, show links between people, places and entities in a criminal network. The same data network has been automatically adjusted--at right--to show particular relationships: subgroups and central criminal figures.

[one person's criminal associations]
(Use "back" to return to the text.)

This graphic shows how an analyst has zeroed in on one entity in the network--such as a person-to view crime associations of that person.

Under KDD, Chen is obtaining the data from two police departments--in Tucson and Phoenix--and scrubbing them of references to an identifiable person, while retaining the integrity of relationships between the database objects.

The idea is to create large law-enforcement databases that can be used for intelligence analysis research. Privacy is preserved but a research resource is available that reflects real-world patterns of criminal activity.

[slide off]
(Use "back" to return to the text.)

You'll be hearing much more detail soon about these and other intriguing projects-in-progress, so I'll sum up now with the general observation that cutting-edge science and technology must be integrated into homeland security efforts.

KDD is a superb example of this, because having the right information at the right time--in the hands of those who need it--is a critical capability to foiling terrorist plots.

We all bear the responsibility to make our nation and our world more secure. I think it is a privilege that in many cases, the work that scientists and engineers already do, and want to do, can be harnessed to meet a current and pressing national need.

In KDD we see once again how fundamental research pays off in unexpected ways--in this case, for the well-being of our nation. I look forward to hearing what emerges from your meeting, and I now welcome questions and comments you might have.

1 Gregory F. Trevorton, Government Executive, Sept. 2002, p.64



National Science Foundation
Office of Legislative and Public Affairs
4201 Wilson Boulevard
Arlington, Virginia 22230, USA
Tel: 703-292-8070
FIRS: 800-877-8339 | TDD: 703-292-5090

NSF Logo Graphic