Award Abstract # 1251151
BIGDATA: Small: DA: Patient-level predictive modeling from massive longitudinal databases

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF CALIFORNIA, LOS ANGELES
Initial Amendment Date: June 28, 2013
Latest Amendment Date: June 28, 2013
Award Number: 1251151
Award Instrument: Standard Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: July 1, 2013
End Date: June 30, 2017 (Estimated)
Total Intended Award Amount: $688,969.00
Total Awarded Amount to Date: $688,969.00
Funds Obligated to Date: FY 2013 = $688,969.00
History of Investigator:
  • Marc Suchard (Principal Investigator)
    msuchard@ucla.edu
  • David Madigan (Co-Principal Investigator)
Recipient Sponsored Research Office: University of California-Los Angeles
10889 WILSHIRE BLVD STE 700
LOS ANGELES
CA  US  90024-4200
(310)794-0102
Sponsor Congressional District: 36
Primary Place of Performance: Regents of the University of California, Los Angeles
Office of Contract and Grant Administration
Los Angeles
CA  US  90095-1406
Primary Place of Performance
Congressional District:
36
Unique Entity Identifier (UEI): RN64EPNH8JC6
Parent UEI:
NSF Program(s): Information Technology Researc,
Big Data Science &Engineering
Primary Program Source: 01001314DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433, 7923, 8083
Program Element Code(s): 164000, 808300
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Massive longitudinal healthcare data, such as administrative claims and electronic health records, provide an opportunity to greatly enhance the accuracy and clinical impact of patient-level predictions across a wide range of outcomes. This research targets the national priority domain of healthcare IT and showcases the advances that Big Data afford in helping patients make informed healthcare decisions leading to improved outcomes. Other involved stakeholders include healthcare providers, insurers and governmental agencies, and the databases this proposed grant employs encompass diverse and vulnerable patient populations, including the young, the poor and the elderly. Within this context, this grant is seeking to predict patient-level health events based upon personal characteristics and conditions. Accurate and well-calibrated predictions could significantly improve the wellbeing of patients and populations. This grant proposes to derive predictive models from massive observational data and then, for example, predict that a particular patient has an 18% chance of experiencing a stroke in the next 12 months. With this prediction in hand, caregivers and patients can optimize medical interventions and implement behavioral changes to hopefully prevent the predicted event. Further, this grant integrates two graduate student researchers, whose mentored experiences begin to rectify the shortage of data scientists trained at the intersection of statistics and medicine, and provides general statistical software tools for building large-scale predictive models from massive data across scientific domains.


From a technical perspective, the proposed grant aims to first evaluate performance and applicability of an existing predictive model across five administrative claims and electronic health record databases covering over 80 million lives, using CHADS2 stroke risk as a motivating example. Then the grant will develop an innovative data-driven process for building patient-level predictive models from longitudinal observational data, and initially apply the process to predicting stroke in patients with atrial fibrillation for comparison of performance against CHADS2, Finally, the grant aims to explore characteristics of the process
and resulting models, such as: evaluation of out-of-sample predictive performance in different databases; consideration of how models change over time; and assessment of which clinical variables most substantially contribute to patient-level predictions. Together, this research will focus on identifying heuristics to extract clinically relevant predictors from longitudinal electronic healthcare data, developing algorithms to use this information in multivariate modeling through massive parallelization using graphics processing units, optimized for data sparsity, and evaluating performance based on accuracy in predicting outcomes at the patient level. As a proof-of-concept, the grant will develop an approach to predict stroke risk and apply this approach across five disparate data sources (80+ million patients, including drugs, lab values, procedures, emergency room visits, primary care visits, inpatient encounters, etc) that reflect diverse patient populations across the US, including the privately insured, Medicare-eligible, and Medicaid beneficiaries. The underlying goal of the grant is to apply innovative statistical and machine learning techniques using advancing computer technology to large-scale observational data to develop accurate and well-calibrated patient-level predictive models enabling the prediction of future medical events for individual patients.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 23)
Baele G, Lemey P, Suchard MA. "Genealogical working distributions for Bayesian model testing with phylogenetic uncertainty" Systematic Biology , v.65 , 2016 , p.250
Baele G, Suchard MA, Bielejec F, Lemey P "Bayesian codon substitution modeling to identify sources of pathogen evolutionary rate variation" Microbial Genomics , v.2 , 2017 , p.e000057 10.1099/mgen.0.000057
Baele G, Suchard MA, Rambaut A, Lemey P "Emerging concepts of data integration in pathogen phylogenetics" Systematic Biology , v.66 , 2017 , p.e47 https://doi.org/10.1093/sysbio/syw054
Beck HE, Mittal S, Madigan D "Reassessing mechanism as a predictor of pediatric injury mortality" Journal of Surgical Research , v.199 , 2015 , p.641
Bedford T, Suchard MA, Lemey P, Dudas G, Gregory V, Hay AJ, McCauler JW, Russell CA, Smith DJ, Rambaut A "Integrating influenza antigenic dynamics with molecular evolution" eLife , v.3 , 2014 , p.e01914
Berger ML, Lipset C, Gutteridge A, Axelsen K, Subedi P, Madigan D. "Optimizing the Leveraging of Real World Data: How It Can Improve the Development and Use of Medicines?" Value in Health , 2015 http://dx.doi.org/10.1016/j.jval.2014.10.009
Bielejec F, Lemey P, Carvalho LM, Baele G, Rambaut A, Suchard MA "pi-BUSS: a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios" BMC Bioinformatics , 2014
Crawford FW, Weiss RE, Suchard MA "Sex, lies and self-reported counts: Bayesian mixture models for longitudinal heaped count data via birth-death processes" Annals of Applied Statistics , v.9 , 2015 , p.572
Cybis GB, Sinsheimer JS, Bedford T, Mather AE, Lemey P, Suchard MA "Assessing phenotypic correlation through the multivariate phylogenetic latent liability model" Annals of Applied Statistics , v.9 , 2015 , p.969
Duke J, Ryan PB, Suchard MA, Hripcsak G, Jin P, Reich C, Schwalm MS, Khoma Y, Wu Y, Xu H, Shah N, Banda J, Schuemie MJ "Risk of angioedema associated with levetiracetam use: findings of the Observational Health Data Sciences and Informatics research network." Epilepsia , v.58 , 2017 , p.e101
Hripcsak G, Ryan PB, Duke J, Shah NH, Park RW, Huser V, Suchard MA, Schuemie MJ, DeFalco F, Perotte A, Banda J, Reich C, Schilling L, Matheny M, Meeker D, Pratt N, Madigan D "Characterizing treatment pathways at scale using the OHDSI network" Proceedings of the National Academy of Sciences , v.113 , 2016 , p.7329 10.1073/pnas.1510502113
(Showing: 1 - 10 of 23)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

To enable predictive modeling from healthcare data, this grant supported the founding and growth of the Observational Health Data Sciences and Informatics (OHDSI) program, a multi-stakeholder, interdisciplinary collaborative to create open-source solutions that bring out the value of observational health data through large-scale analytics. Major achievements of OHDSI over the course of this grant are an international demonstration of characterization of treatment pathways across three major chronic diseases over 250 million patients, a global network study of clinical predictive importance and the completion of a large-scale analysis involving over 17,000 comparative effectiveness and drug safety studies.

Scientifically, the grant also advanced computational and statistical techniques to extract clinically relevant predictors from longitudinal electronic healthcare data, to develop algorithms to use this information in multivariate modeling through massive parallelization using graphics processing units, optimized for data sparsity, and to evaluate performance based on accuracy in predicting outcomes at the patient level.  To communicate this work, the grant generated 25 peer-reviewed publications.

Finally, this grant targeted the national priority domain of healthcare IT and showcased the advances that Big Data affords in helping patients make informed healthcare decisions leading to improved outcomes.  Other involved stakeholders included healthcare providers, insurers and governmental agencies, and the databases this grant employed encompassed diverse and vulnerable patient populations, including the young, the poor and the elderly. Within this context, the grant yielded improved abilities to predict patient-level health events (for example, will I have a stroke?) based upon personal characteristics and conditions. Accurate and well-calibrated predictions could significantly improve the well-being of patients and populations.

 

 


Last Modified: 08/31/2017
Modified by: Marc A Suchard

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page