Award Abstract # 1143717
III: EAGER - Expressive Scalable Querying over Integrated Linked Open Data

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: WRIGHT STATE UNIVERSITY
Initial Amendment Date: July 14, 2011
Latest Amendment Date: March 29, 2012
Award Number: 1143717
Award Instrument: Standard Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2011
End Date: August 31, 2014 (Estimated)
Total Intended Award Amount: $125,828.00
Total Awarded Amount to Date: $141,828.00
Funds Obligated to Date: FY 2011 = $125,828.00
FY 2012 = $16,000.00
History of Investigator:
  • Amit Sheth (Principal Investigator)
    amit@sc.edu
  • Pascal Hitzler (Co-Principal Investigator)
Recipient Sponsored Research Office: Wright State University
3640 COLONEL GLENN HWY
DAYTON
OH  US  45435-0002
(937)775-2425
Sponsor Congressional District: 10
Primary Place of Performance: Wright State University
OH  US  45435-0001
Primary Place of Performance
Congressional District:
10
Unique Entity Identifier (UEI): NPT2UNTNHJZ1
Parent UEI:
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01001112DB NSF RESEARCH & RELATED ACTIVIT
01001213DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7364, 7916, 9251
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Linked Open Data (LOD) is rapidly developing into an open data movement to connect a large variety of data across the World Wide Web using standards adopted by the World Wide Web Consortium (W3C). Driven by researchers, government agencies and companies, the resulting Web of Data has grown to over 25 billion RDF triples and is showing exponential growth. However, simply putting collections of data on the Web will be of very limited value. The key to unlocking the value for developing more powerful search, browsing, exploration and analysis is to richly interlink or semantically integrate components of LOD. Given the size, growth rate, heterogeneity and growing areas of coverage, manual semantic integration or interlinking is not practical. Furthermore, current techniques focus on 'same-as' relationship, which is much abused due to limited expressivity. This calls for ways to represent and identify richer and more explicit relationships between different entities that reflect the richness of relations that exist in the real world.

This project develops exploratory techniques to richly interlink components of LOD and then addresses the challenge of querying the LOD cloud, i.e., of obtaining answers to questions which require accessing, retrieving and combining information from different parts of the LOD cloud. Techniques for overcoming semantic heterogeneity include: semantic enrichment through Wikipedia bootstrapping; semantic integration through abstraction by means of upper-level ontologies; and, massively parallel methods for tractable ontology reasoning. Specifically, this research will: (1) identify richer, broader, and more relevant relationships between LOD datasets at instance and schema level (these relationships will promote better knowledge discovery, querying, and mapping of ontologies); (2) realize LOD query federation through an upper level ontology; and, (3) enable access to implicit knowledge through ontology reasoning. The project involves significant risk as it treads new paths in a new terrain, primarily due to the lack of descriptive information (schema) about the data provided by highly autonomous data sources, the significant syntactic and semantic heterogeneity among data originating from independent data sources, and the significantly larger scale, as well as unforeseeable obstacles associated with a rapidly changing and expanding environment.

This project aims to advance the state of the art in semantic integration of large amounts of heterogeneous and autonomously developed or managed data. It seeks to fundamentally transform the landscape of LOD usage because successful LOD querying is a key enabler for a variety of applications. The results of this project could set the stage for the development, and the far reaching adoption, of Semantic Web. The project is integrated with education and research-based advanced training of graduate and undergraduate students. Additional information about the project can be found at: http://knoesis.org/research/semweb/projects/ESQuILO.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Kalpa Gunaratna, Sarasi Lalithsena, and Amit Sheth "Alignment and dataset identification of linked data in Semantic Web" Wiley Interdisciplinary Reviews , v.4 , 2014 , p.139 doi: 10.1002/widm.1121

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Linked Open Data (LOD) is rapidly developing into an open data movement to connect a large variety of data across the World Wide Web using standards adopted by the World Wide Web Consortium (W3C). Driven by researchers, government agencies and companies, the resulting Web of Data has grown to over 1000 datasets and is showing exponential growth. However, simply putting collections of data on the Web will be of very limited value. The key to unlocking the value for developing more powerful search, browsing, exploration and analysis is to richly interlink or semantically integrate components of LOD. Given the size, growth rate, heterogeneity and growing areas of coverage, manual semantic integration or interlinking is not practical. Furthermore, current techniques focus on a construct owl:sameAs that is abused due to limited expressiveness, and hence is ineffective or yields poor quality of integration. What is needed is to be able to represent and identify richer and more explicit relationships between different entities, so that the richness of the real world is not crammed inaccurately and inappropriately into very limited types of relationships. At the same time, exponential growth of the LOD in terms of size and diversity creates challenges to identify and analyze datasets for both human and application consumptions. Even though popular datasets such as DBPedia, Freebase, MusicBrainz are well known and widely used in the community, there can be other hidden gems that will be useful for specialized applications.

To address the challenges, this project develop exploratory techniques to richly interlink components of LOD, address the challenges of querying the LOD cloud and propose approaches to discover datasets compress and create entity summaries. Specifically we worked on

  • Identifying more expressive relationships such as partonomic relationships among instances in datasets which are well-established, fundamental properties grounded in linguistics and philosophy

  • Identifying alignment among properties in the datasets which is considered to be equally important as concept or instance alignment since properties capture how two concepts and/or instances are related

  • Developing alignment based LOD query federation through an upper level ontology

  • Identifying relevant datasets for a given need at hand by creating automatic domain descriptions for LOD datasets due to the lack of descriptive information (schema) about the data provided by highly autonomous data sources

  • Creating diversified entity summaries to quickly analyze the entities of the datasets

  • Developing lossless compression techniques to compress the large RDF datasets on LOD cloud

 

 


Last Modified: 11/15/2014
Modified by: Amit Sheth