
NSF Org: |
TI Translational Impacts |
Recipient: |
|
Initial Amendment Date: | December 1, 2000 |
Latest Amendment Date: | December 1, 2000 |
Award Number: | 0060675 |
Award Instrument: | Standard Grant |
Program Manager: |
Jean C. Bonney
TI Translational Impacts TIP Directorate for Technology, Innovation, and Partnerships |
Start Date: | January 1, 2001 |
End Date: | June 30, 2001 (Estimated) |
Total Intended Award Amount: | $99,984.00 |
Total Awarded Amount to Date: | $99,984.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
48 Wall Street, 11th Floor New York NY US 10003-4602 (212)918-4412 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
48 Wall Street, 11th Floor New York NY US 10003-4602 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | SBIR Phase I |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.084 |
ABSTRACT
This Small Business Innovation Research (SBIR) Phase I project will investigate the feasibility of high-risk, high-return research toward creating general-purpose de-duplication software. De-duplication software identifies multiple database records that refer to one entity (such as a person), thereby enabling the merger of fragmented data. ChoiceMaker markets a research-derived de-duplication system called MEDD. Many fundamental social services, including child immunization, require accurate de-duplication. New York City currently uses MEDD to de-duplicate its immunization records, thereby successfully improving children's public health. However, smaller public health organizations cannot benefit from MEDD because they cannot afford the 6 weeks of computer consulting that are required to customize MEDD for their data. ChoiceMaker's proposed research would decrease the adaptation time by an order-of-magnitude-making de-duplication affordable for most public health organizations and nearly every business with mission-critical databases. MEDD employs an important emerging information-theoretic statistical technique (called maximum entropy) to mimic the decisions made by people evaluating whether to merge similar records. Maximum entropy technology supports software that can 'understand' each individual database's idiosyncratic information semantics and structure. In the proposed research, ChoiceMaker will investigate significant, innovative extensions to maximum entropy technology that will dramatically increase MEDD's convenience and flexibility.
This research has applications to enhancing the data quality of any database which might contain multiple entries for the same entity due to the lack of a reliable identifying key. Specifically, there are applications to the management of master patient indices by health care providers and lists of clients and vendors at large institutions. The system is equally useful for matching and linking records in two different databases, such as for merging mailing lists for direct marketing, linking medical records for epidemiological research, and matching buy and sell orders for securities transa
Please report errors in award information by writing to: awardsearch@nsf.gov.