NSF Award Search: Award # 1838200

Award Abstract # 1838200

BigData:IA:Collaborative Research: TIMES: A tensor factorization platform for spatio-temporal data

NSF Org:	IIS Division of Information & Intelligent Systems
Recipient:	EMORY UNIVERSITY
Initial Amendment Date:	September 10, 2018
Latest Amendment Date:	August 5, 2019
Award Number:	1838200
Award Instrument:	Continuing Grant
Program Manager:	Hector Munoz-Avila IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	October 1, 2018
End Date:	September 30, 2022 (Estimated)
Total Intended Award Amount:	$950,337.00
Total Awarded Amount to Date:	$950,337.00
Funds Obligated to Date:	FY 2018 = $802,658.00 FY 2019 = $147,679.00
History of Investigator:	Joyce Ho (Principal Investigator) joyce.c.ho@emory.edu Li Xiong (Co-Principal Investigator)
Recipient Sponsored Research Office:	Emory University 201 DOWMAN DR NE ATLANTA GA US 30322-1061 (404)727-2503
Sponsor Congressional District:	05
Primary Place of Performance:	Emory University 400 Dowman Dr Atlanta GA US 30322-1005
Primary Place of Performance Congressional District:	05
Unique Entity Identifier (UEI):	S352L5PJLMP8
Parent UEI:
NSF Program(s):	Big Data Science &Engineering
Primary Program Source:	01001819DB NSF RESEARCH & RELATED ACTIVIT 01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	8083, 062Z, 9102
Program Element Code(s):	808300
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Spatio-temporal analyses can enable many discoveries including reducing traffic congestion, identifying hotspot areas to deploy mobile clinics, and urban planning. Unfortunately, the data poses many computational challenges. Standard assumptions in machine learning and data mining algorithms are violated by the complex nature of spatio-temporal data. These include spatial and temporal correlation of observations, dynamic and abrupt changes in observations, variability in measurements with respect to length and frequency, and multi-sourced data that spans multiple sources of information. In recognition of these challenges, various efforts have been undertaken to develop specialized spatiotemporal models. Yet, to date, these algorithms are predominately designed to analyze small- to medium-sized datasets. The goal of this project is to develop a comprehensive computational tensor platform to perform automated, data-driven discovery from spatio-temporal data across a broad range of applications. The project also includes a set of integrated educational activities such as a Massive Open Online Course that covers cross-disciplinary topics at the confluence of computer science and geospatial applications, annual spatio-temporal data challenges and hackathons, and an annual event at the Atlanta Science Festival to create public awareness and encourage participation by women and minorities.

The project will contain algorithmic innovations that reflect appropriate assumptions of spatio-temporal data without sacrificing real-time performance, computational scalability, and cross-site learning even under privacy constraints. The proposed platform will generalize tensor modeling to encompass the complex nature of spatio-temporal data including time irregularity, spatiotemporal correlations, and evolving distributions. It will enable the integration of multi-sourced data from heterogeneous sources to yield robust and cohesive learned patterns. The novel algorithms will also facilitate learning in decentralized settings while preserving privacy. The computational platform will contain interchangeable modules that can adapt to new spatio-temporal settings and incorporate additional contextual information. The accompanying suite of algorithms will enable predictive learning, pattern mining, and change detection from large-sized spatio-temporal data. The broad applicability of the project will be demonstrated on a diverse range of data including urban transportation services, real estate market transactions, and population health. The algorithmic innovations introduced can be used to scale other machine learning models.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 29)

Show All

Zhang, Jing and Shin, Bonggun and Choi, Jinho D and Ho, Joyce C "SMAT: An Attention-Based Deep Learning Solution to the Automation of Schema Matching" ADBIS 2021: Advances in Databases and Information Systems , 2021 Citation Details

Xu, Ran and Yu, Yue and Cui, Hejie and Kan, Xuan and Zhu, Yanqiao and Ho, Joyce and Zhang, Chao and Yang, Carl "Neighborhood-Regularized Self-Training for Learning with Few Labels" Thirty-Seventh AAAI Conference on Artificial Intelligence , 2023 https://doi.org/10.1609/aaai.v37i9.26260 Citation Details

Xie, Han and Ma, Jing and Xiong, Li and Yang, Carl "Federated Graph Classification over Non-IID Graphs" Advances in neural information processing systems , v.34 , 2021 Citation Details

Wang, Wenjie and Tang, Pengfei and Xiong, Li and Jiang, Xiaoqian "RADAR: Recurrent Autoencoder Based Detector for Adversarial Examples on Temporal EHR" ECML PKDD 2020: Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track , 2021 Citation Details

Sotoodeh, Mani and Xiong, Li and Ho, Joyce "CrowdTeacher: Robust Co-teaching with Noisy Answers and Sample-Specific Perturbations for Tabular Data" PAKDD 2021: Advances in Knowledge Discovery and Data Mining , 2021 https://doi.org/10.1007/978-3-030-75765-6_15 Citation Details

Afshar, Ardavan and Yin, Kejing and Yan, Sherry and Qian, Cheng and Ho, Joyce and Park, Haesun and Sun, Jimeng "SWIFT: Scalable Wasserstein Factorization for Sparse Nonnegative Tensors" Proceedings of the AAAI Conference on Artificial Intelligence , v.35 , 2021 Citation Details

Dong, Wenqin and Lee, Eric W and Hertzberg, Vicki Stover and Simpson, Roy L and Ho, Joyce C "GASP: Graph-Based Approximate Sequential Pattern Mining for Electronic Health Records" ADBIS 2021: New Trends in Database and Information Systems , 2021 Citation Details

He, Huan and Henderson, Jette and Ho, Joyce C "Distributed Tensor Decomposition for Large Scale Health Analytics" WWW '19 The World Wide Web Conference , 2019 10.1145/3308558.3313548 Citation Details

He, Huan and Xi, Yuanzhe and Ho, Joyce C "Accelerated SGD for Tensor Decomposition of Sparse Count Data" 2020 International Conference on Data Mining Workshops (ICDMW) , 2020 https://doi.org/10.1109/ICDMW51313.2020.00047 Citation Details

He, Huan and Xi, Yuanzhe and Ho, Joyce C "Fast and Accurate Tensor Decomposition without a High Performance Computing Machine" 2020 IEEE International Conference on Big Data , 2020 https://doi.org/10.1109/BigData50022.2020.9378111 Citation Details

He, Huan and Zhao, Shifan and Xi, Yuanzhe and Ho, Joyce "GDA-AM: On the Effectiveness of Solving Min-imax Optimization Via Anderson Mixing" International Conference on Learning Representations , 2022 Citation Details

(Showing: 1 - 10 of 29)

Show All

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Spatiotemporal data poses computational challenges due to the inter-dependencies amongst the observations that are not readily encapsulated in common data structures. Tensors, a generalization of vectors and matrices to multiway data, are natural representations for capturing the high-dimensional interactions across space and time. By leveraging the powerful and flexible tensor data structure, automated and data-driven discovery can be performed on spatiotemporal data. The project outcomes includes a suite of algorithmic developments and theoretical advancements to scale and distribute tensor analysis, as well as validation across a variety of applications.

The intellectual merit is highlighted with the delivery of 1) new spatiotemporal tensor factorization models, 2) new methodologies to support data analysis under the streaming setting where all the data cannot be readily stored or accessed more than once, 3) new scalable algorithms that do not require a high performance machine and offer faster convergence, 4) a new privacy-preserving federated tensor factorization model that offers differential privacy guarantees, and 5) the first, communication-efficient, decentralized tensor factorization model that works for multiple network topologies. Furthermore, the project successfully validated the broad applicability of tensor factorization in multiple domains including the healthcare, urban transportation, social media, and crime prediction. The project outcomes have been disseminated in various conference venues, workshop, tutorials, and invited talks in the fields of machine learning, data mining, and medical informatics. The findings and algorithms have been incorporated into multiple Emory courses. The project supported 1 postdoctoral fellow, 11 PhD students, and 5 undergraduates. The project has also taken steps towards advancing diversity, equity, and inclusion in the sciences by supporting 7 graduate and 4 undergraduate students from underrepresented groups.

Last Modified: 01/17/2023
Modified by: Joyce C Ho

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error