Award Abstract # 1717084
III: Small: Learning Latent Representations of Heterogeneous Information Networks

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: THE PENNSYLVANIA STATE UNIVERSITY
Initial Amendment Date: July 27, 2017
Latest Amendment Date: July 27, 2017
Award Number: 1717084
Award Instrument: Standard Grant
Program Manager: Hector Munoz-Avila
hmunoz@nsf.gov
 (703)292-4481
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: August 1, 2017
End Date: July 31, 2021 (Estimated)
Total Intended Award Amount: $499,635.00
Total Awarded Amount to Date: $499,635.00
Funds Obligated to Date: FY 2017 = $499,635.00
History of Investigator:
  • Wang-Chien Lee (Principal Investigator)
  • Zhen Lei (Co-Principal Investigator)
Recipient Sponsored Research Office: Pennsylvania State Univ University Park
201 OLD MAIN
UNIVERSITY PARK
PA  US  16802-1503
(814)865-1372
Sponsor Congressional District: 15
Primary Place of Performance: Pennsylvania State Univ University Park
360D IST Building
University Park
PA  US  16802-1400
Primary Place of Performance
Congressional District:
15
Unique Entity Identifier (UEI): NPM2J7MSCF61
Parent UEI:
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7364, 7923
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Feature engineering is an important pre-processing step in applying machine learning algorithms for knowledge discovery in all fields of scientific research and business applications. In these applications it is crucial to obtain appropriate features that best describe the observed phenomena. Traditionally, researchers often manually decide features of interest based on the knowledge and experiences of domain experts, which is costly and labor-intensive. Recently, a new line of research, called representation learning, has used neural networks to automatically learn features that may be used in various scientific research projects and business applications. The PI plans new representation learning methods to capture rich, meaningful and discriminative features in heterogeneous information networks (HINs), which have been used to model heterogeneous types of network entities and their relationships in support of network data analysis and mining. The work planned in this project includes information about model design, scalability, sample data extraction, network variety and data heterogeneity issues in the implementation of the learning frameworks. This research will be integrated into graduate and undergraduate courses of data mining and machine learning, enabling students to develop analytics and big data skills.

The specific research objectives of this project are three-fold: 1) The PI aims to leverage information in HINs to learn representations of latent features for nodes and relationships specified by meta-paths in the network. Novel techniques will be developed to address the scalability issues in learning. 2) The PI seeks to address model design and learning issues arising in HINs growing with time, e.g., citation networks. New neural network architectures and new sample data extraction schemes will be devised. 3) The PI plans to integrate both content and network structures in representation learning of HINs. New neural network architectures will be devised. To evaluate research prototypes, the PI will develop a testbed consisting of new neural network frameworks for representation learning on HINs. Techniques and software will be made available as research resources to the communities of data mining and representation learning.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 25)
Chen, Yi-Ling and Yang, De-Nian and Shen, Chih-Ya and Lee, Wang-Chien and Chen, Ming-Syan "On Efficient Processing of Group and Subsequent Queries for Social Activity Planning" IEEE Transactions on Knowledge and Data Engineering , v.31 , 2019 10.1109/TKDE.2018.2875911 Citation Details
Chiang, Meng-Fen and Lim, Ee-Peng and Lee, Wang-Chien and Ashok, Xavier Jayaraj and Prasetyo, Philips Kokoh "One-Class Order Embedding for Dependency Relation Prediction" Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval , 2019 10.1145/3331184.3331249 Citation Details
Chiang, Meng-Fen and Lim, Ee-Peng and Lee, Wang-Chien and Hoang, Tuan-Anh "Inferring Trip Occupancies in the Rise of Ride-Hailing Services" Proceedings of the 27th ACM International Conference on Information and Knowledge Management , 2018 10.1145/3269206.3272025 Citation Details
Chiang, Meng-Fen and Lim, Ee-Peng and Lee, Wang-Chien and PRASETYO, Philips Kokoh "CO2Vec: Embeddings of Co-Ordered Networks Based on Mutual Reinforcement" Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA 2020) , 2020 https://doi.org/10.1109/DSAA49011.2020.00027 Citation Details
Fu, Tao-yang and Lee, Wang-Chien "DeepIST: Deep Image-based Spatio-Temporal Network for Travel Time Estimation" Proceedings of the 28th ACM International Conference on Information and Knowledge Management , 2019 10.1145/3357384.3357870 Citation Details
Fu, Tao-yang and Lee, Wang-Chien "ProgRPGAN: Progressive GAN for Route Planning" Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021) , 2021 https://doi.org/10.1145/3447548.3467406 Citation Details
Fu, Tao-Yang and Lee, Wang-Chien "Trembr: Exploring Road Networks for Trajectory Representation Learning" ACM Transactions on Intelligent Systems and Technology , v.11 , 2020 10.1145/3361741 Citation Details
Fu, Tao-yang and Lee, Wang-Chien and Lei, Zhen "HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning" Proceedings of the 2017 ACM on Conference on Information and Knowledge Management , 2017 10.1145/3132847.3132953 Citation Details
Gao, Jinyang and Ooi, Beng Chin and Shen, Yanyan and Lee, Wang-Chien "Cuckoo Feature Hashing: Dynamic Weight Sharing for Sparse Analytics" Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence , 2018 10.24963/ijcai.2018/295 Citation Details
He, Fang and Lee, Wang-Chien and Fu, Tao-Yang and Lei, Zhen "CINES: Explore Citation Network and Event Sequences for Citation Forecasting" Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021) , 2021 https://doi.org/10.1145/3404835.3462903 Citation Details
Hung, Hui-Ju and Lee, Wang-Chien and Yang, De-Nian and Shen, Chih-Ya and Lei, Zhen and Chow, Sy-Miin "Efficient Algorithms towards Network Intervention" Proceedings of The Web Conference 2020 , 2020 10.1145/3366423.3380269 Citation Details
(Showing: 1 - 10 of 25)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Representation learning, aiming to automatically learn low-dimensional latent features for use in various scientific research projects and business applications, is an important research topic in the fields of data mining, machine learning and big data analytics. In this project, the research team developed new representation learning frameworks with novel ideas to capture rich, meaningful and discriminative features in various kinds of heterogeneous information networks, including social networks, bibliographic networks, citation networks, road networks, etc., in order to support a variety of network data analysis and mining applications, such as restaurant type classification, patent classification, travel time estimation, future citation prediction, route planning, and so on. This project investigated generic network representation learning techniques by exploring multiple relationships specified in forms of meta-paths in a heterogeneous information network and developed efficient and effective neural network models to learn latent representation of nodes and meta-paths in the network. While generic network representation learning techniques can be generally applied to a wide range of heterogeneous information networks, representation learning techniques for specific types of heterogeneous information networks, e.g., road networks and publication citation networks where the moving behaviors of travelers on roads and the flow of knowledge among published papers are very different, need to be specialized to explore inherent characteristics and unique features of the corresponding networks and application domains. Specifically, this project investigated representation learning problems on road networks and publication citation networks and developed new representation learning frameworks with novel ideas, neural network model designs, data preparation and sampling schemes.  With rigorous testing and comprehensive evaluation using datasets collected in the project, the developed frameworks models are shown to outperform the state-of-the-art techniques by extensive experiments, which validates the various new ideas proposed in the project. While these algorithms, models and techniques are designed for the learning frameworks in this project, the underlying ideas and concepts resulted from the research may very much likely be applicable to other neural network architectures, potentially advancing the research in data mining and machine learning. 

This project has generated data, publications, presentations, and software that may facilitate follow-up research and collaborations. The research findings have been disseminated to the research communities of data mining, machine learning and data science through publication and conference presentations. The research insights obtained from the project have further inspired the research team to explore new research problems in knowledge graphs, transfer learning, reinforcement learning, and community detection. This project allowed the team to achieve its educational objectives by teaching and promoting student learning in the fields of data mining and machine learning. Two Ph.D. students involved in the project have graduated with solid training in data mining and machine learning. The project also facilitated the team to provide research opportunities to M.S., undergraduate and female students. Several undergraduate students (including a female) involved in the project entered graduate schools after graduation.  

 

 


Last Modified: 12/14/2021
Modified by: Wang-Chien Lee

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page