Award Abstract # 9113530
Eletronic Materials For Natural Language Research

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: TRUSTEES OF THE UNIVERSITY OF PENNSYLVANIA, THE
Initial Amendment Date: June 26, 1991
Latest Amendment Date: June 26, 1991
Award Number: 9113530
Award Instrument: Standard Grant
Program Manager: Larry H. Reeker
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: July 1, 1991
End Date: December 31, 1993 (Estimated)
Total Intended Award Amount: $139,904.00
Total Awarded Amount to Date: $139,904.00
Funds Obligated to Date: FY 1991 = $139,904.00
History of Investigator:
  • Mark Liberman (Principal Investigator)
    myl@unagi.cis.upenn.edu
Recipient Sponsored Research Office: University of Pennsylvania
3451 WALNUT ST STE 440A
PHILADELPHIA
PA  US  19104-6205
(215)898-7293
Sponsor Congressional District: 03
Primary Place of Performance: DATA NOT AVAILABLE
Primary Place of Performance
Congressional District:
Unique Entity Identifier (UEI): GM1XX56LEP58
Parent UEI: GM1XX56LEP58
NSF Program(s): CISE Research Resources,
ARTIFICIAL INTELL & COGNIT SCI,
SPECIAL PROGRAMS-RESERVE
Primary Program Source:  
Program Reference Code(s): 6856, 9259
Program Element Code(s): 289000, 685600, 914500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This software capitalization project is to fund the reformatting of the online text data, which exists as part of the Association for Computational Linguistics Data Encoding Initiative, into a common SGML-based format and make it available to the research community at low cost and with minimal restrictions. This is the first of several collections which are being re-formatted. The project enables scaling up of natural language research so that more realistic problems can be studied. This is particularly relevant for applications in the recognition and analysis of text and speech. Existing generally-available text databases are too small. It is expensive and time consuming to obtain sufficient text and to make it usable for research. For individual researchers to duplicate this effort is wasteful. A common database will permit published results to be replicated or extended. There is joint funding for this project with other NSF offices and with DARPA.

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page