Award Abstract # 2012214
SBIR Phase I: Advanced Cancer Analytics Platform for Highly Accurate and Scalable Survival Models to Personalize Oncology Strategies

NSF Org: TI
Translational Impacts
Recipient: INSILICA LLC
Initial Amendment Date: August 24, 2020
Latest Amendment Date: June 3, 2022
Award Number: 2012214
Award Instrument: Standard Grant
Program Manager: Alastair Monk
amonk@nsf.gov
 (703)292-4392
TI
 Translational Impacts
TIP
 Directorate for Technology, Innovation, and Partnerships
Start Date: August 15, 2020
End Date: October 31, 2022 (Estimated)
Total Intended Award Amount: $224,454.00
Total Awarded Amount to Date: $224,454.00
Funds Obligated to Date: FY 2020 = $224,454.00
History of Investigator:
  • Thomas Luechtefeld (Principal Investigator)
    tom@insilica.co
Recipient Sponsored Research Office: INSILICA, LLC
7106 RIVER RD
BETHESDA
MD  US  20817-4770
(341)691-4630
Sponsor Congressional District: 08
Primary Place of Performance: INSILICA, LLC
2736 Quarry Heights Way
Baltimore
MD  US  21209-1069
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): RTN8V2BMGY63
Parent UEI:
NSF Program(s): SBIR Phase I
Primary Program Source: 01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1718, 8018, 8032
Program Element Code(s): 537100
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.084

ABSTRACT

The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase I project will develop personalized clinical decision-making in cancer care. An estimated 17 million cases of cancer are diagnosed globally each year. Over $90 billion per year is spent in total on cancer-related health care in the U.S., and cancer patients pay over $4 billion out of pocket for health care. Therapeutic strategy selection and clinical trial research targeted to oncology become exponentially complex when unique types of cancer are considered, as well as how they may uniquely impact gender, race, ethnicity, and age of affected populations. The proposed technology will develop advanced bioinformatics models and visualization tools to guide decision-making by oncologists. It will develop and use advanced survival models targeting cancer types, other biological and chemical factors, and patient demographics.

This Small Business Innovation Research (SBIR) Phase I project will focus on three objectives. 1) We will develop and validate transfer learning models that leverage large data sets from high-incidence cancer types to improve results of cancer types with sparse data. 2) We will leverage these data in a disease-agnostic platform using a recurrent neural network to account for temporal variation to predict survivability. 3) We will develop visualization tools for clinicians to understand causal relationships. This system will use several innovations: a) Transfer Learning to Scale Available Data: Since cancer survival modeling is limited in many cancer types due to lack of data, we will demonstrate the feasibility of transfer learning in this context. b) Single Recurrent Neural Network: We will implement a recurrent neural network to improve performance and allow a single network to be trained across all cancer types and patient population characteristics. c) Control Feature Mediation Analysis: We will develop accurate survival models with an understanding of the sensitivity to inputs. d) Clinician-Driven Interpretation and Visualization Tools: The framework needs interpretation and visualization features to reduce data into reports easily digestible for clinical decision-making.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

During this Phase I effort, a prototype Biobricks.ai platform was technically architected, developed, and tested to acquire, process and normalize 40+ life science databases related to biomarkers, clinical oncology outcomes, and cheminformatics and construct cancer survival, and cheminformatic models.

A modular data-dependency package management system was built to enable development of normalization code for distribution of arbitrary tabular data in parquet format as well as serialized neural network models. The biobricks.ai innovation combines two technologies, GIT, the open-source version control system with global adoption for managing code development and data-version-control, a version control layer built on git for versioning large files and bulding data asset creation pipelines.

Biobricks.ai layers an R and python client on these technologies and feeds data requests through a centralized portal.

Establishment of feasibility
The preliminary version of biobricks.ai established feasibility of the technology by constructing fully-functional data-dependencies and using them in reporting and AI applications. Specifically,

1. Construction 40+ clinically relevant data dependencies
2. Construction of AI models for cancer survival (Figure 1)
3. Completion of automated quality control tool bricktools
4. Construction of parameterized reports
5. Active Documentation and account management portals (Figure 2)
6. Active Https server dvc.biobricks.ai records brick usage
7. Public use of biobricks by external users

In the final pivot of our phase 1 proposal, we focused on the construction of a functioning data-registry that serves clinical datasets (or ‘data dependencies’) and functional AI models. A large number of data dependencies were constructed on biobricks.ai (table 1) and dependent AI models for patient hazard prediction and chemical neural network model embeddings were subsequently created and added to the registry.

This completion demonstrates the feasibility of the technology. Users can now install, rebuild, and pull data from publicly registered bricks, and the code for doing so is very simple (see biobricks.ai for more examples):

In addition to providing a fast and simple manner to download public data, the system provides automated checks for quality control via the bricktools package (github.com/biobricksai/bricktools), which helps to maintain a high quality registry.

An imported biobricks data-dependency is given a version tracking code changes, schema changes, and data-quantity changes. By importing a data-dependency, users can tie their downstream assets to a specific upstream data version, and then deploy the new assets on the biobricks.ai system thus enabling other users to depend on it as well. One such downstream application is oncsurvive, a neural network based patient embedding model that trains using patient mutation and survival data from the genomic data commons to optimize a numeric hazard embedding that ranks patients in order of survival time Figure 1. This demonstrates that building dependent downstream applications is not only feasible, but actively supported by the biobricks system. The system has a preliminary intermediary server which provides users with tokens and tracks the assets they use, and the bandwidth they use. This will eventually be built into a usage based billing system. The web portals that support this system exist today at biobricks.ai, members.biobricks.ai, and dvc.biobricks.ai. The first two are browser based servers for documentation and account management, and the last of which is a simple https server to track brick usage (not browser based). All together, these system provide a fully functional biobricks system which we can improve, extend, and commercialize. Biobricks.ai is already in active use at several research organizations.

Core Registry
Biobricks.ai now supports over 40 data dependency ‘bricks’, 2 AI models and parameterized reports, which are systems that generate human readable reports from inputs like a list of patient mutations. These bricks, models, parameterized reports, and clients, are all available on github.com/biobricks-ai. Table 1 provides an abbreviated list of some of the notable completions for the phase 1 project.
Summary
The final pivot of this project involved the creation of BioBricks.ai - an open source, open access, data registry for health informatics, that accelerates and normalizes data distribution and access. 


Last Modified: 02/11/2023
Modified by: Thomas Luechtefeld

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page