Award Abstract # 0060675
SBIR Phase I: Maximum Entropy Data De-duplication

NSF Org: TI
Translational Impacts
Recipient:
Initial Amendment Date: December 1, 2000
Latest Amendment Date: December 1, 2000
Award Number: 0060675
Award Instrument: Standard Grant
Program Manager: Jean C. Bonney
TI
 Translational Impacts
TIP
 Directorate for Technology, Innovation, and Partnerships
Start Date: January 1, 2001
End Date: June 30, 2001 (Estimated)
Total Intended Award Amount: $99,984.00
Total Awarded Amount to Date: $99,984.00
Funds Obligated to Date: FY 2001 = $99,984.00
History of Investigator:
  • Andrew Borthwick (Principal Investigator)
    Andrew.Borthwick@choicemaker.com
Recipient Sponsored Research Office: ChoiceMaker Technologies, Inc.
48 Wall Street, 11th Floor
New York
NY  US  10003-4602
(212)918-4412
Sponsor Congressional District: 10
Primary Place of Performance: ChoiceMaker Technologies, Inc.
48 Wall Street, 11th Floor
New York
NY  US  10003-4602
Primary Place of Performance
Congressional District:
10
Unique Entity Identifier (UEI):
Parent UEI:
NSF Program(s): SBIR Phase I
Primary Program Source: 01000102DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 9215, HPCC
Program Element Code(s): 537100
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.084

ABSTRACT

This Small Business Innovation Research (SBIR) Phase I project will investigate the feasibility of high-risk, high-return research toward creating general-purpose de-duplication software. De-duplication software identifies multiple database records that refer to one entity (such as a person), thereby enabling the merger of fragmented data. ChoiceMaker markets a research-derived de-duplication system called MEDD. Many fundamental social services, including child immunization, require accurate de-duplication. New York City currently uses MEDD to de-duplicate its immunization records, thereby successfully improving children's public health. However, smaller public health organizations cannot benefit from MEDD because they cannot afford the 6 weeks of computer consulting that are required to customize MEDD for their data. ChoiceMaker's proposed research would decrease the adaptation time by an order-of-magnitude-making de-duplication affordable for most public health organizations and nearly every business with mission-critical databases. MEDD employs an important emerging information-theoretic statistical technique (called maximum entropy) to mimic the decisions made by people evaluating whether to merge similar records. Maximum entropy technology supports software that can 'understand' each individual database's idiosyncratic information semantics and structure. In the proposed research, ChoiceMaker will investigate significant, innovative extensions to maximum entropy technology that will dramatically increase MEDD's convenience and flexibility.

This research has applications to enhancing the data quality of any database which might contain multiple entries for the same entity due to the lack of a reliable identifying key. Specifically, there are applications to the management of master patient indices by health care providers and lists of clients and vendors at large institutions. The system is equally useful for matching and linking records in two different databases, such as for merging mailing lists for direct marketing, linking medical records for epidemiological research, and matching buy and sell orders for securities transa

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page