Award Abstract # 1453796
CAREER: Verified Compilers for a Multi-Language World

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: NORTHEASTERN UNIVERSITY
Initial Amendment Date: January 15, 2015
Latest Amendment Date: May 14, 2019
Award Number: 1453796
Award Instrument: Continuing Grant
Program Manager: Anindya Banerjee
abanerje@nsf.gov
 (703)292-7885
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: May 1, 2015
End Date: April 30, 2022 (Estimated)
Total Intended Award Amount: $508,092.00
Total Awarded Amount to Date: $540,092.00
Funds Obligated to Date: FY 2015 = $194,552.00
FY 2017 = $117,258.00

FY 2018 = $112,692.00

FY 2019 = $115,590.00
History of Investigator:
  • Amal Ahmed (Principal Investigator)
    amal@ccs.neu.edu
Recipient Sponsored Research Office: Northeastern University
360 HUNTINGTON AVE
BOSTON
MA  US  02115-5005
(617)373-5600
Sponsor Congressional District: 07
Primary Place of Performance: Northeastern University
360 Huntington Ave
Boston
MA  US  02115-5005
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): HLTMVS2JZBS6
Parent UEI:
NSF Program(s): Software & Hardware Foundation
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
01001718DB NSF RESEARCH & RELATED ACTIVIT

01001819DB NSF RESEARCH & RELATED ACTIVIT

01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045, 7798, 7943, 9102, 9251
Program Element Code(s): 779800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Title: CAREER: Verified Compilers for a Multi-Language World

Compilers play a critical role in the production of software. As such, they should be correct. That is, they should preserve the behavior of all programs they compile. Despite remarkable progress on formally verified compilers in recent years, these compilers suffer from a serious limitation: they are proved correct under the assumption that they will only be used to compile whole programs. This is an entirely unrealistic assumption since most software systems today are comprised of components written in different languages compiled by different compilers to a common low-level target language. The intellectual merit of this project is the development of a proof architecture for building verified compilers for today's world of multi-language software: such verified compilers guarantee correct compilation of components and support linking with arbitrary target code, no matter its source. The project's broader significance and importance are that verified compilation of components stands to benefit practically every software system, from safety-critical software to web browsers, because such systems use libraries or components that are written in a variety of languages. The project will achieve broad impact through the development of (i) a proof methodology that scales to realistic multi-pass compilers and multi-language sofware, (ii) a target language that extends LLVM---increasingly the target of choice for modern compilers---with support for compilation from type-safe source languages, and (iii) educational materials related to the proof techniques employed in the course of this project.

The project has two central themes, both of which stem from a view of compiler correctness as a language interoperability problem. First, specification of correctness of component compilation demands a formal semantics of interoperability between the source and target languages. More precisely: if a source component (say s) compiles to target component (say t), then t linked with some arbitrary target code (say t') should behave the same as s interoperating with t'. Second, enabling safe interoperability between components compiled from languages as different as Java, Rust, Python, and C, requires the design of a gradually type-safe target language based on LLVM that supports safe interoperability between more precisely typed, less precisely typed, and type-unsafe components.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 21)
Amal Ahmed "Verified Compilers for a Multi-Language World" SNAPL: The Inaugural Summit on Advances in Programming Languages (SNAPL?15), Asilomar, California, May 2015. , 2015
Amal Ahmed, Dustin Jamner, Jeremy Siek, Philip Wadler "Theorems for Free for Free: Parametricity, With and Without Types" Proceedings of the ACM on Programming Languages (PACMPL) , v.1 , 2017
Daniel Patterson and Amal Ahmed "Linking Types for Multi-Language Software: Have Your Cake and Eat it Too" SNAPL: Summit on Advances in Programming Languages (SNAPL'17), May 2017. LIPIcs. , 2017
Daniel Patterson and Amal Ahmed "The Next 700 Compiler Correctness Theorems. A Functional Pearl" (ICFP) Proceedings of the ACM in Programming Languages (PACMPL) , v.3 , 2019 10.1145/3341689
Daniel Patterson, Jamie Perconti, Christos Dimoulas, Amal Ahmed "FunTAL: Reasonably Mixing a Functional Language with Assembly" ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '17), Barcelona, Spain, June 2017 , 2017
Daniel Patterson, Noble Mushtak, Andrew Wagner, Amal Ahmed "Semantic Soundness for Language Interoperability" ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'22), San Diego, California, June 2022. , 2022 10.1145/3519939.3523703
Gabriel Scherer, Max S. New, Nick Rioux, Amal Ahmed "FabULous Interoperability for ML and a Linear Language" Conference on Foundations of Software Science and and Computations Structures (FoSSaCS '18) , 2018
Marco Patrignani, Amal Ahmed, Dave Clarke "Formal Approaches to Secure Compilation: A Survey of Fully Abstract Compilation and Related Work" ACM Computing Surveys, 51(6):125.1-125.36 , v.51 , 2019
Mates, Phillip and Perconti, Jamie and Ahmed, Amal "Under Control: Compositionally Correct Closure Conversion with Mutable State" Principles and Practice of Programming Languages 2019 (PPDP 19) , 2019 10.1145/3354166.3354181 Citation Details
Max S. New and Amal Ahmed "Graduality from Embedding Projection Pairs" Proceedings of the ACM on Programming Languages (PACMPL) , v.4 , 2018
Max S. New and Daniel Licata "Call-By-Name Gradual Type Theory" International Conference on Formal Structures for Computation and Deduction (FSCD'18). , 2018
(Showing: 1 - 10 of 21)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Compilers, which translate high-level source code written by programmers intothe low-level target code that runs on machines, play a critical role in theproduction of software. As such, they should be correct, i.e., they shouldpreserve the behavior of all programs they compile. Prior to this project, whilethere had been remarkable progress on formally verified compilers forincreasingly realistic languages, these verified compilers suffered from aserious limitation: they were proved correct under impractical assumptions aboutwhat the compiled code would be linked with, from no linking at all to linkingonly with code compiled from the same source language. These assumptions do notmatch reality as most software systems today are comprised of components writtenin different languages compiled by different compilers to a common target, aswell as low-level libraries that may be handwritten in the target language.
A central goal of this project was to investigate proof methods for buildingverified compositional compilers, which guarantee correct compilation ofcomponents, not just whole programs, and formally support linking with arbitrarytarget code.  The long-term vision, extending beyond this project, is to haveverified compositional compilers from languages as different as C, OCaml, Rust,and Coq to a common low-level target language that supports safeinteroperability between more precisely typed, less precisely typed, andtype-unsafe components. 
The first major result of this project was the development of a genericcompositional compiler correctness (CCC) theorem that we argue is the desiredtheorem when compiling components instead of whole programs. This resultaddresses a critical question: how do we even state the compiler correctnesstheorem in the presence of linking? While there have been a number of recentresults on compositional compiler correctness, they all state theircorrect-component-compilation theorems in remarkably different ways, yieldingpros and cons that aren't well understood.  We showed that specificcompiler-verification efforts can use their choice of formalism ``under thehood'' and then prove that their theorems imply CCC. 
The second major outcome of the project were compositional compiler correctnessresults for compilers that, unlike prior work, support linking withtarget-language features that are not present in the source language:specifically with control effects such as exceptions and call/cc. Linking withfeatures not present in the source is a critical feature since that often whyprogrammers implement components in different languages. 
The third major result of the project was a multi-language that supports safemixing of a high-level typed functional language with low-level assembly code.This is nontrivial since assembly is not compositional: assembly languagecomponents may be comprised of different numbers of basic blocks so it isn'tclear how to identify component boundaries. To support safe mixing, we designeda compositional typed assembly language (TAL), showed how to designmulti-language semantics so TAL components can be safely embedded in ahigh-level functional language and vice versa. 
The fourth major outcome of this project is a methodology for verifying thesoundness of foreign-function interfaces (FFIs).  Every language implements anFFI for language interoperability, but existing FFIs are all ad hoc. Wedeveloped a semantic framework for proving soundness of  FFIs as they are designed "in the wild", by which we mean that FFI developersimplement conversion between types of data from two different source languagesby implementing target-level conversion functions to mediate between datarepresentations coming from the two source languages. We applied thisframework to a series of case studies that demonstrate how this  approach allows us to account for complex differences in language semantics andmake efficiency trade-offs based on specifics of compilers or targets. 
The fifth major set of results pertain to compilation of languages with advancedtype systems, specifically dependent types. Dependently typed languages such asthat of the Coq proof assistant have rich types that enable verification of fullfunctional correctness during program development.  This project showed how todevelop correct and *typed* compiler passes for the core language of the Coqproof assistant. These passes include typed CPS translation, which since 2002was conjectured to be impossible, typed closure conversion, and ANF. Theseresults are key to building a practical verified compiler for Coq which wouldensure that fully verified source code cannot be compromised via compiler bugsor unsafe linking.
The sixth major series of results from this project pertain to gradualtyping, i.e., the mixing of more precisely typed and less precisely typedcode. One of the goals of the project was to design a gradually type-safelow-level language that supports safe interoperability between components thatare statically type-safe (e.g., compiled from OCaml or Rust) and dynamicallytype-safe (e.g., compiled from Python or Scheme). This project produced a seriesof foundational results on design principles for gradually typed langauges. 
Finally, for education and broader impacts: the project involved development ofsummer school lectures delivered in 4 separate years at the Oregon PL Summer School  (OPLSS) on a variety of research results from this project. It also includedlectures and talks at ECOOP, the PL Mentoring Workshop, a keynote at theStrangeLoop conference, and courses at Northeastern that explain how ideas fromtype systems, runtime contract checking, gradual typing, multi-languagesemantics, and compiler correctness can be brought together to build verifiedcompilers for a multi-language world. 


Last Modified: 07/28/2023
Modified by: Amal Ahmed

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page