
NSF Org: |
TI Translational Impacts |
Recipient: |
|
Initial Amendment Date: | December 12, 2016 |
Latest Amendment Date: | December 27, 2017 |
Award Number: | 1647681 |
Award Instrument: | Standard Grant |
Program Manager: |
Peter Atherton
patherto@nsf.gov (703)292-8772 TI Translational Impacts TIP Directorate for Technology, Innovation, and Partnerships |
Start Date: | December 15, 2016 |
End Date: | March 31, 2018 (Estimated) |
Total Intended Award Amount: | $223,238.00 |
Total Awarded Amount to Date: | $223,238.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
17217 WATERVIEW PKWY DALLAS TX US 75252-8004 (972)729-9582 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
PO Box 836088 Richardson TX US 75083-6088 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | SBIR Phase I |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.084 |
ABSTRACT
The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase I project will be the creation of a new tool that could prevent the loss of sensitive data stored in big data management systems due to various cyberattacks. Furthermore, the proposed tool can allow organizations to audit big data usage to prevent abuse and misuse of the stored data. The existence of such a novel tool may increase trust in these big data management systems, and protect the sensitive data stored in such systems against various outsider and insider attacks. The company believes that such a tool would address an important customer need and has the potential to have significant commercial impact as more and more companies are adopting big data management technologies such as Hadoop and Spark. The company plans to pursue a freemium business model and open source some of the developed code. This in turn may improve the data protection capabilities provided by existing freely available open source tools that can be used by many different companies and organizations.
This Small Business Innovation Research (SBIR) Phase I project will prove the feasibility of a novel big data privacy, security and governance management tool. This new tool will provide enhanced security and privacy protection capabilities such as enforcing privacy policies using on-the-fly data masking, enforcing security policies using role-based access control techniques, and enforcing governance policies using data encryption, and advanced auditing and accountability features in one tool without the need to modify/change the underlying big data management system. To successfully develop the proposed prototype, the company will address many technical challenges such as developing efficient privacy-preserving policy enforcement solutions with very little overhead, and designing an interactive user interface that supports easy governance and privacy policy specification tasks. To address these technical challenges, the company proposes to leverage recent advances in aspect oriented programming to inject code directly into submitted data analysis jobs in a seamless manner to enable transparent data encryption, data sanitization, and accountability, compliance and governance policy enforcement. Using this injected code, the data that is stored in encrypted format could be decrypted and sanitized before it is used for data analysis as needed. Furthermore, necessary logs could be generated for accountability purposes.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
With the emergence of big data revolution, there has been a surge in the adoption of NoSQL databases (not only Sql) due to their scalability, capabilities for handling both structured and unstructured data, and cost-effectiveness. Increasingly, in addition to existing relational databases, organizations are moving their data to data lakes built using these NoSQL databases and trying to gain unique insights using advanced data analytics and machine learning techniques. As more and more sensitive data are collected in these NoSQL databases, they have become an attractive target for hackers. Furthermore, recent regulations require companies to fundamentally change how they deal with the privacy sensitive data stored in those NoSQL databases. This SBIR project aims to develop key technologies to protect big data stored in NoSQL databases and data lakes. Our proposed system provide unique capabilities that will enable organizations to protect sensitive parts of big data and comply easily with existing and upcoming data security and privacy regulations. This in return will allow organizations to move to a more "data-centric" cyber security and compliance posture and significantly reduce costs incurred due to cyber attacks and regulatory compliance requirements.
Our proposed system is build on top of existing NoSQL databases such as Hadoop and Spark and designed as a data access broker where each request submitted by a user app is automatically captured by our system. These requests are logged, analyzed and then modified (if needed) to conform with security and privacy policies, and submitted to underlying NoSQL database. The proposed system is totally transparent from the user point of view and does not require any change to the user’s code and/or the underlying NoSQL database systems. Therefore, it can be deployed on existing NoSQL databases with very little effort.
During our phase I project, we developed a prototype that showed the technical and commercial feasibility of the underlying technical ideas. Using this prototype, we showed that for Hadoop and Spark, the proposed system can intercept requests that try to access the data, log and modify these requests according to specified policies. We chose Hadoop and Spark because they are two of the most popular NoSQL database systems and provide advanced data analytics and machine learning capabilities that other NoSQL databases do not provide directly. In addition, we have implemented a policy enforcement framework that works with existing Hadoop and Spark systems, and performed extensive evaluation of our implemented policy enforcement framework. The technical outcomes of this project can be summarized as follows:
- We built a prototype that seamlessly integrates with existing big data platforms and basic security tools.
- We observed minimal overhead for our policy enforcement technology in popular big data platforms.
- We build a mechanism to efficiently deploy/install our prototype with existing big data projects in a cluster consisting of large number of machines.
- We configured scalable mechanisms to process data usage log.
- We built user friendly web interface to input policies and monitor data usage.
- We found bugs in a very popular, big data security project called Apache Ranger. We reported those bugs to appropriate authorities and they are working on the fixes.
The project participants got the opportunity to get hands on experience with several big data platforms, such as, Hadoop, Spark, Ranger, Knox, etc.; and had chance to interact with potential customers and existing users of big data platforms. This allowed participants to get valuable insights related to big data adoption, such as customer pain points about the existing solutions and unsolved customer needs.
Last Modified: 05/10/2018
Modified by: Fahad Shaon
Please report errors in award information by writing to: awardsearch@nsf.gov.