
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | January 15, 2015 |
Latest Amendment Date: | May 14, 2019 |
Award Number: | 1453796 |
Award Instrument: | Continuing Grant |
Program Manager: |
Anindya Banerjee
abanerje@nsf.gov (703)292-7885 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | May 1, 2015 |
End Date: | April 30, 2022 (Estimated) |
Total Intended Award Amount: | $508,092.00 |
Total Awarded Amount to Date: | $540,092.00 |
Funds Obligated to Date: |
FY 2017 = $117,258.00 FY 2018 = $112,692.00 FY 2019 = $115,590.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
360 HUNTINGTON AVE BOSTON MA US 02115-5005 (617)373-5600 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
360 Huntington Ave Boston MA US 02115-5005 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Software & Hardware Foundation |
Primary Program Source: |
01001718DB NSF RESEARCH & RELATED ACTIVIT 01001819DB NSF RESEARCH & RELATED ACTIVIT 01001920DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Title: CAREER: Verified Compilers for a Multi-Language World
Compilers play a critical role in the production of software. As such, they should be correct. That is, they should preserve the behavior of all programs they compile. Despite remarkable progress on formally verified compilers in recent years, these compilers suffer from a serious limitation: they are proved correct under the assumption that they will only be used to compile whole programs. This is an entirely unrealistic assumption since most software systems today are comprised of components written in different languages compiled by different compilers to a common low-level target language. The intellectual merit of this project is the development of a proof architecture for building verified compilers for today's world of multi-language software: such verified compilers guarantee correct compilation of components and support linking with arbitrary target code, no matter its source. The project's broader significance and importance are that verified compilation of components stands to benefit practically every software system, from safety-critical software to web browsers, because such systems use libraries or components that are written in a variety of languages. The project will achieve broad impact through the development of (i) a proof methodology that scales to realistic multi-pass compilers and multi-language sofware, (ii) a target language that extends LLVM---increasingly the target of choice for modern compilers---with support for compilation from type-safe source languages, and (iii) educational materials related to the proof techniques employed in the course of this project.
The project has two central themes, both of which stem from a view of compiler correctness as a language interoperability problem. First, specification of correctness of component compilation demands a formal semantics of interoperability between the source and target languages. More precisely: if a source component (say s) compiles to target component (say t), then t linked with some arbitrary target code (say t') should behave the same as s interoperating with t'. Second, enabling safe interoperability between components compiled from languages as different as Java, Rust, Python, and C, requires the design of a gradually type-safe target language based on LLVM that supports safe interoperability between more precisely typed, less precisely typed, and type-unsafe components.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Compilers, which translate high-level source code written by programmers intothe low-level target code that runs on machines, play a critical role in theproduction of software. As such, they should be correct, i.e., they shouldpreserve the behavior of all programs they compile. Prior to this project, whilethere had been remarkable progress on formally verified compilers forincreasingly realistic languages, these verified compilers suffered from aserious limitation: they were proved correct under impractical assumptions aboutwhat the compiled code would be linked with, from no linking at all to linkingonly with code compiled from the same source language. These assumptions do notmatch reality as most software systems today are comprised of components writtenin different languages compiled by different compilers to a common target, aswell as low-level libraries that may be handwritten in the target language.
A central goal of this project was to investigate proof methods for buildingverified compositional compilers, which guarantee correct compilation ofcomponents, not just whole programs, and formally support linking with arbitrarytarget code. The long-term vision, extending beyond this project, is to haveverified compositional compilers from languages as different as C, OCaml, Rust,and Coq to a common low-level target language that supports safeinteroperability between more precisely typed, less precisely typed, andtype-unsafe components.
The first major result of this project was the development of a genericcompositional compiler correctness (CCC) theorem that we argue is the desiredtheorem when compiling components instead of whole programs. This resultaddresses a critical question: how do we even state the compiler correctnesstheorem in the presence of linking? While there have been a number of recentresults on compositional compiler correctness, they all state theircorrect-component-compilation theorems in remarkably different ways, yieldingpros and cons that aren't well understood. We showed that specificcompiler-verification efforts can use their choice of formalism ``under thehood'' and then prove that their theorems imply CCC.
The second major outcome of the project were compositional compiler correctnessresults for compilers that, unlike prior work, support linking withtarget-language features that are not present in the source language:specifically with control effects such as exceptions and call/cc. Linking withfeatures not present in the source is a critical feature since that often whyprogrammers implement components in different languages.
The third major result of the project was a multi-language that supports safemixing of a high-level typed functional language with low-level assembly code.This is nontrivial since assembly is not compositional: assembly languagecomponents may be comprised of different numbers of basic blocks so it isn'tclear how to identify component boundaries. To support safe mixing, we designeda compositional typed assembly language (TAL), showed how to designmulti-language semantics so TAL components can be safely embedded in ahigh-level functional language and vice versa.
The fourth major outcome of this project is a methodology for verifying thesoundness of foreign-function interfaces (FFIs). Every language implements anFFI for language interoperability, but existing FFIs are all ad hoc. Wedeveloped a semantic framework for proving soundness of FFIs as they are designed "in the wild", by which we mean that FFI developersimplement conversion between types of data from two different source languagesby implementing target-level conversion functions to mediate between datarepresentations coming from the two source languages. We applied thisframework to a series of case studies that demonstrate how this approach allows us to account for complex differences in language semantics andmake efficiency trade-offs based on specifics of compilers or targets.
The fifth major set of results pertain to compilation of languages with advancedtype systems, specifically dependent types. Dependently typed languages such asthat of the Coq proof assistant have rich types that enable verification of fullfunctional correctness during program development. This project showed how todevelop correct and *typed* compiler passes for the core language of the Coqproof assistant. These passes include typed CPS translation, which since 2002was conjectured to be impossible, typed closure conversion, and ANF. Theseresults are key to building a practical verified compiler for Coq which wouldensure that fully verified source code cannot be compromised via compiler bugsor unsafe linking.
The sixth major series of results from this project pertain to gradualtyping, i.e., the mixing of more precisely typed and less precisely typedcode. One of the goals of the project was to design a gradually type-safelow-level language that supports safe interoperability between components thatare statically type-safe (e.g., compiled from OCaml or Rust) and dynamicallytype-safe (e.g., compiled from Python or Scheme). This project produced a seriesof foundational results on design principles for gradually typed langauges.
Finally, for education and broader impacts: the project involved development ofsummer school lectures delivered in 4 separate years at the Oregon PL Summer School (OPLSS) on a variety of research results from this project. It also includedlectures and talks at ECOOP, the PL Mentoring Workshop, a keynote at theStrangeLoop conference, and courses at Northeastern that explain how ideas fromtype systems, runtime contract checking, gradual typing, multi-languagesemantics, and compiler correctness can be brought together to build verifiedcompilers for a multi-language world.
Last Modified: 07/28/2023
Modified by: Amal Ahmed
Please report errors in award information by writing to: awardsearch@nsf.gov.