
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | June 13, 2014 |
Latest Amendment Date: | May 3, 2017 |
Award Number: | 1422133 |
Award Instrument: | Standard Grant |
Program Manager: |
Anindya Banerjee
abanerje@nsf.gov (703)292-7885 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 1, 2014 |
End Date: | July 31, 2017 (Estimated) |
Total Intended Award Amount: | $499,812.00 |
Total Awarded Amount to Date: | $507,812.00 |
Funds Obligated to Date: |
FY 2017 = $8,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
360 HUNTINGTON AVE BOSTON MA US 02115-5005 (617)373-5600 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
360 Huntington Ave Boston MA US 02115-5005 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Software & Hardware Foundation |
Primary Program Source: |
01001718DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Title: SHF: Small: Secure Compilation of Advanced Languages
Advanced programming languages, based on dependent types, enable program verification alongside program development, thus making them an ideal tool for building fully verified, high assurance software. Recent dependently typed languages that permit reasoning about state and effects---such as Hoare Type Theory (HTT) and Microsoft's F*---are particularly promising and have been used to verify a range of rich security policies, from state-dependent information flow and access control to conditional declassification and information erasure. But while these languages provide the means to verify security and correctness of high-level source programs, what is ultimately needed is a guarantee that the same properties hold of compiled low-level target code. Unfortunately, even when compilers for such advanced languages exist, they come with no formal guarantee of correct compilation, let alone any guarantee of secure compilation---i.e., that compiled components will remain as secure as their high-level counterparts when executed within arbitrary low-level contexts. This project seeks to demonstrate how to build realistic yet secure compilers. This is a notoriously difficult problem. On one hand, a secure compiler must ensure that low-level contexts cannot launch any "attacks" on the compiled component that would have been impossible to launch in the high-level language. On the other hand, a realistic compiler cannot simply limit the expressiveness of the low-level target language to achieve the security goal.
The intellectual merit of this project is the development of a powerful new proof architecture for realistic yet secure compilation of dependently typed languages that relies on contracts to ensure that target-level contexts respect source-level security guarantees and leverages these contracts in a formal model of how source and target code may interoperate. The broader impact is that this research will make it possible to compose high-assurance software components into high-assurance software systems, regardless of whether the components are developed in a high-level programming language or directly in assembly. Compositionality has been a long-standing open problem for certifying systems for high-assurance. Hence, this research has potential for enormous impact on how high-assurance systems are built and certified. The specific goal of the project is to develop a verified multi-pass compiler from Hoare Type Theory to assembly that is type preserving, correct, and secure. The compiler will include passes that perform closure conversion, heap allocation, and code generation. To prove correct compilation of components, not just whole programs, this work will use an approach based on defining a formal semantics of interoperability between source components and target code. To guarantee secure compilation, the project will use (static) contract checking to ensure that compiled code is only run in target contexts that respect source-level security guarantees. To carry out proofs of compiler correctness, the project will develop a logical relations proof method for Hoare Type Theory.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Programming languages with advanced type systems facilitate the development of reliable software by giving programmers the ability to express numerous safety and security properties of their programs via types. Violations of these properties can be caught via type checking at compile time, rather than at run time. Languages like OCaml and F# have types that provide memory safety and information hiding guarantees, while dependently typed languages such as that of the Coq proof assistant have richer types that enable verification of full functional correctness during program development. In recent years, numerous projects have used Coq to build fully verified software fromOS kernels to compilers to cryptographic protocols.
While advanced typed languages are a powerful tool for verifying safety, security, and correctness of source programs, what is ultimately needed is a guarantee that the same properties hold of compiled low-level code. Unfortunately, compilers for such advanced languages come with no formal guarantee of correct compilation in the presence of linking with other code, let alone any guarantee of secure compilation---i.e., that compiled components will remain as secure as their source-level counterparts when linked with arbitrary target-level code. This is a concern because compiled code is routinely linked with libraries implemented directly in the target language or compiled from other source languages. Formally, a correct compiler is one that preserves the semantics of all source code it compiles, while a secure compiler is one that is fully abstract: it guarantees that two compiled components can be distinguished by some target-level code if and only if their source-level counterparts can be distinguished by some source-level code. Thus, secure compilers have the added benefit of allowing programmers to reason about their programs (such as correctness of refactoring) by *thinking* only in their source language.
This project has developed methods for building realistic yet secure (fully abstract) compilers for advanced typed languages, a notoriously difficult problem. Fully abstract compilation has historically been achieved by either making the source language more expressive or making the target language less expressive. Both approaches are infeasible in the context of real compilers whose target languages are usually more expressive than the source, allowing target-level attackers to make observations about compiled components that source-level attackers cannot make about the original source component---e.g., via disruptions of control flow, or inspection of information private to a module.
A central goal was to demonstrate that source-language types can be translated/compiled to target-language types (static contracts) that can be used to ensure that target-level contexts respect source-level security guarantees. The use of static rather than dynamic contracts for ensuring security is critical as it allows us to avoid the significant performance overheads that accompany dynamic enforcement of security guarantees.
The first major result of our project is the development of correct and *typed* compiler passes for the core language of the Coq proof assistant. These include typed CPS translation, which since 2002 was conjectured to be impossible, and typed closure conversion that is compatible with critical features of Coq. These results are key to building a practical verified compiler for Coq which would ensure that fully verified source code cannot be compromised via compiler bugs or unsafe linking.
The second major outcome are proof techniques for verifying that acompiler is fully abstract. The project devised "backtranslation" techniques that demonstrate that target-level attackers can be modeled at source level, even when the target language contains features that are absent from the source. Different classes of proof techniques were developed for terminating languages (back-translation by partial evaluation) and for Turing-complete languages (back-translation via universal embedding).
The third major result is development of interoperability semantics between high-level (direct-style) languages and a low-level assembly, that supports embedding assembly in high-level programs, and vice versa, while ensuring safety of the mixed program. The key challenge was developing a notion of "components" in assembly such that components can be made up of different number of basic blocks and can be nested. This should inform design of other high/low multi-languages, e.g., for verification of C code mixed with assembly and just-in-time (JIT) compilers.
Finally, the project noted the tension between secure compilation--which must disallow linking with code whose behavior cannot be expressed in the compiler's source language--and the building of useful multi-language software--which frequently demands such linking. The project advocated that language designs be extended to facilitate development of compilers and toolchains that support building multi-language software. Specifically, every language should come with *linking types* extensions that provide programmers with the means to annotate their source code to express the points where they wish to link with code behaviors that cannot be expressed in their language. Giving programmers high-level means to control low-level linking seems essential for building practical secure compilers and toolchains that report cross-language type errors to aid in the development of multi-language software.
Last Modified: 04/19/2018
Modified by: Amal Ahmed
Please report errors in award information by writing to: awardsearch@nsf.gov.