
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | July 27, 2017 |
Latest Amendment Date: | July 27, 2017 |
Award Number: | 1717084 |
Award Instrument: | Standard Grant |
Program Manager: |
Hector Munoz-Avila
hmunoz@nsf.gov (703)292-4481 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 1, 2017 |
End Date: | July 31, 2021 (Estimated) |
Total Intended Award Amount: | $499,635.00 |
Total Awarded Amount to Date: | $499,635.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
201 OLD MAIN UNIVERSITY PARK PA US 16802-1503 (814)865-1372 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
360D IST Building University Park PA US 16802-1400 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Feature engineering is an important pre-processing step in applying machine learning algorithms for knowledge discovery in all fields of scientific research and business applications. In these applications it is crucial to obtain appropriate features that best describe the observed phenomena. Traditionally, researchers often manually decide features of interest based on the knowledge and experiences of domain experts, which is costly and labor-intensive. Recently, a new line of research, called representation learning, has used neural networks to automatically learn features that may be used in various scientific research projects and business applications. The PI plans new representation learning methods to capture rich, meaningful and discriminative features in heterogeneous information networks (HINs), which have been used to model heterogeneous types of network entities and their relationships in support of network data analysis and mining. The work planned in this project includes information about model design, scalability, sample data extraction, network variety and data heterogeneity issues in the implementation of the learning frameworks. This research will be integrated into graduate and undergraduate courses of data mining and machine learning, enabling students to develop analytics and big data skills.
The specific research objectives of this project are three-fold: 1) The PI aims to leverage information in HINs to learn representations of latent features for nodes and relationships specified by meta-paths in the network. Novel techniques will be developed to address the scalability issues in learning. 2) The PI seeks to address model design and learning issues arising in HINs growing with time, e.g., citation networks. New neural network architectures and new sample data extraction schemes will be devised. 3) The PI plans to integrate both content and network structures in representation learning of HINs. New neural network architectures will be devised. To evaluate research prototypes, the PI will develop a testbed consisting of new neural network frameworks for representation learning on HINs. Techniques and software will be made available as research resources to the communities of data mining and representation learning.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Representation learning, aiming to automatically learn low-dimensional latent features for use in various scientific research projects and business applications, is an important research topic in the fields of data mining, machine learning and big data analytics. In this project, the research team developed new representation learning frameworks with novel ideas to capture rich, meaningful and discriminative features in various kinds of heterogeneous information networks, including social networks, bibliographic networks, citation networks, road networks, etc., in order to support a variety of network data analysis and mining applications, such as restaurant type classification, patent classification, travel time estimation, future citation prediction, route planning, and so on. This project investigated generic network representation learning techniques by exploring multiple relationships specified in forms of meta-paths in a heterogeneous information network and developed efficient and effective neural network models to learn latent representation of nodes and meta-paths in the network. While generic network representation learning techniques can be generally applied to a wide range of heterogeneous information networks, representation learning techniques for specific types of heterogeneous information networks, e.g., road networks and publication citation networks where the moving behaviors of travelers on roads and the flow of knowledge among published papers are very different, need to be specialized to explore inherent characteristics and unique features of the corresponding networks and application domains. Specifically, this project investigated representation learning problems on road networks and publication citation networks and developed new representation learning frameworks with novel ideas, neural network model designs, data preparation and sampling schemes. With rigorous testing and comprehensive evaluation using datasets collected in the project, the developed frameworks models are shown to outperform the state-of-the-art techniques by extensive experiments, which validates the various new ideas proposed in the project. While these algorithms, models and techniques are designed for the learning frameworks in this project, the underlying ideas and concepts resulted from the research may very much likely be applicable to other neural network architectures, potentially advancing the research in data mining and machine learning.
This project has generated data, publications, presentations, and software that may facilitate follow-up research and collaborations. The research findings have been disseminated to the research communities of data mining, machine learning and data science through publication and conference presentations. The research insights obtained from the project have further inspired the research team to explore new research problems in knowledge graphs, transfer learning, reinforcement learning, and community detection. This project allowed the team to achieve its educational objectives by teaching and promoting student learning in the fields of data mining and machine learning. Two Ph.D. students involved in the project have graduated with solid training in data mining and machine learning. The project also facilitated the team to provide research opportunities to M.S., undergraduate and female students. Several undergraduate students (including a female) involved in the project entered graduate schools after graduation.
Last Modified: 12/14/2021
Modified by: Wang-Chien Lee
Please report errors in award information by writing to: awardsearch@nsf.gov.