
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | September 26, 2013 |
Latest Amendment Date: | September 26, 2013 |
Award Number: | 1360694 |
Award Instrument: | Continuing Grant |
Program Manager: |
Sol Greenspan
sgreensp@nsf.gov (703)292-7841 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 1, 2013 |
End Date: | February 29, 2016 (Estimated) |
Total Intended Award Amount: | $407,022.00 |
Total Awarded Amount to Date: | $407,022.00 |
Funds Obligated to Date: |
FY 2012 = $210,376.00 FY 2013 = $194,157.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
107 S INDIANA AVE BLOOMINGTON IN US 47405-7000 (317)278-3473 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
509 E 3RD ST Bloomington IN US 47401-3654 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Software & Hardware Foundation |
Primary Program Source: |
01001213DB NSF RESEARCH & RELATED ACTIVIT 01001213RB NSF RESEARCH & RELATED ACTIVIT 01001314DB NSF RESEARCH & RELATED ACTIVIT 01001314RB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Modern software engineering methods improve programmer productivity by taking an incremental approach to software development. Software engineers rapidly develop prototypes and then iteratively refine the prototypes into production systems. However, today's programming systems do not support a smooth transition from prototyping to production. On one hand, scripting languages and interactive environments support prototyping while on the other hand conventional programming languages and optimizing compilers support the development of reusable and efficient production codes. Neither support both prototyping and production, so developers use a mixture of programming systems. This practice incurs many costs such as the impedance mismatch of inter-language data transfers and the time to translate programs between languages.
The goal of this research is to discover the scientific principles necessary for a single programming system to effectively support the incremental refinement of prototypes into production software. To accomplish this research objective, classic conflicts between flexibility and safety and between abstraction and performance need to be resolved. To achieve both flexibility and safety, the research will investigate ways to combine dynamic and static type checking, using an approach called gradual typing. To achieve both abstraction and performance, the research will develop a domain-specific compiler for linear algebra and show how show how high-level abstractions can provide greater opportunities for compiler optimization than conventional abstractions such as loops and scalar operations. The broader impacts of the project arise from improvements to programmer productivity and software quality.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Incremental development is an essential feature of modern software
engineering and greatly improves programmer productivity. Software
teams rapidly develop prototypes and then iteratively refine the
prototypes into production systems. However, today’s programming
systems do not support a smooth transition from prototyping to
production. On one hand, dynamic languages such as MATLAB and Python
support prototyping while on the other hand static languages such as
C++ and Java support the development of reusable and efficient
production codes. Neither support both prototyping and production, so
developers use a mixture of programming systems. This practice incurs
many costs such as the impedance mismatches of inter-language data
transfers and the time to translate programs between languages.
The goal of the project "CAREER: Bridging the Gap Between Prototyping
and Production" was to make progress towards having a single
programming system that would support a gradual transition from
prototyping to production. To accomplish the research objective, the
long-standing conflicts between flexibility and safety and between
abstraction and performance need to be resolved. Regarding flexibilty
versus safety, the project investigated the idea of gradual typing,
which combines both static and dynamic type checking within the same
language. Regarding abstraction and performance, the project
investigated the compilation of a high-level language for matrix
algebra into high-performance code. We discuss the scientific
accomplishments on these two lines of research in the following
paragraphs. But to summarize the outputs of the project, we published
18 papers including 9 at high-impact conferences and journals. We
graduated 2 Ph.D. students and 2 M.S. students and 3 more
Ph.D. students are in progress. The project also produced two software
artifacts that have been released to the public, a gradually typed
version of the Python programming language, and the Build-to-Order
Basic Linear Algebra Subroutines, about which we will say more below.
The project made significant discoveries regarding both the theory and
practice of gradual typing. On the theory side, we discovered how to
integrate gradual typing with parametric polymorphism (aka. generics)
in a way that preserves "parametricity", which is import for reasoning
about the correctness of polymorphic procedures. We solved the space
complexity problem that arises from the use of proxies to ensure the
safety of statically typed code that interacts with dynamically typed
code. We also solved problems regarding runtime overhead in statically
typed code (not just in partially typed or dynamically typed
code). Finally, we developed formal criteria to better characterize
when a language design satisfies the intent of gradual typing.
Regarding the practice of gradual typing, we designed and implemented
a gradually typed variant of Python, named Reticulated Python. We
encountered a number of challenges along the way and invented
solutions to them, including "transient casts", which guarantee the
safety of statically typed code without using proxies, thereby
avoiding interoperability problems that we encountered regarding plain
Python code and with code written in C (parts of the Python standard
library, and many extensions). Another significant challenge to
gradual typing is that the performance of partially typed code can
degrade significantly. To solve this problem, we have initiated two
projects. We are investigating whether just-in-time compilation can
improve performance by implementing the Racket language using the
RPython framework. In the second project, we are building a native
code ahead-of-time compiler that implements the theoretical advances
mentioned above. So far, both the just-in-time and ahead-of-time
compilers are showing promise and delivering good performance on
benchmark programs.
The second line of research in this project was in the compilation of
a language for matrix algebra into high-performance parallel code.
The goal of our compiler was to produce the best-possible code for a
particular combination of matrix operations. For example, it compiles
the input program
y := b * A' * (A * x)
into low-level C code that is organized to make best use of multiple
processors and the multiple levels of cache memory within a computer.
The code transformations to accomplish this are well known: loop
fusion, array contraction, and data parallel loops. However, the the
techniques are not beneficial in every situation, so the challenge is
finding the the best combination of transformations for a given input
program and target computer architecture. Thus, the essense of the
problem is mathematical optimization and search. We evaluated a wide
variety of search techniques, including genetic algorithms and
hill-climbing, and we developed a new heuristic/greedy search
algorithm. We found that the best approach was a hybrid that started
with our greedy search and then finished with a genetic search. With
this approach we were able to generate code for a wide range of
real-world matrix computations that was just as efficient as code
produced by a human expert and that out-performed the best
general-purpose loop-optimizing compilers.
Last Modified: 11/17/2016
Modified by: Jeremy G Siek
Please report errors in award information by writing to: awardsearch@nsf.gov.