Award Abstract # 1452959
CAREER:Understanding Program Comprehension for Automated Software Documentation Generation

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: UNIVERSITY OF NOTRE DAME DU LAC
Initial Amendment Date: January 30, 2015
Latest Amendment Date: June 11, 2019
Award Number: 1452959
Award Instrument: Continuing Grant
Program Manager: Sol Greenspan
sgreensp@nsf.gov
 (703)292-7841
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2015
End Date: August 31, 2020 (Estimated)
Total Intended Award Amount: $450,000.00
Total Awarded Amount to Date: $450,000.00
Funds Obligated to Date: FY 2015 = $172,634.00
FY 2017 = $89,891.00

FY 2018 = $92,405.00

FY 2019 = $95,070.00
History of Investigator:
  • Collin McMillan (Principal Investigator)
    collin.mcmillan@nd.edu
Recipient Sponsored Research Office: University of Notre Dame
940 GRACE HALL
NOTRE DAME
IN  US  46556-5708
(574)631-7432
Sponsor Congressional District: 02
Primary Place of Performance: University of Notre Dame
940 Grace Hall
Notre Dame
IN  US  46556-5708
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): FPU6XGFXMBE9
Parent UEI: FPU6XGFXMBE9
NSF Program(s): Software & Hardware Foundation
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
01001718DB NSF RESEARCH & RELATED ACTIVIT

01001819DB NSF RESEARCH & RELATED ACTIVIT

01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045, 7944
Program Element Code(s): 779800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The objective of this research project is 1) to create a model of program comprehension for how software development professionals write software documentation, and 2) to use this model to design algorithms to automate the process of writing documentation. The process of writing documentation is a major expense in software development projects, and is often neglected. By automating key components of the process, this research helps programmers to avoid this expense and therefore to be more productive.

The project studies the process that programmers follow when reading source code to write documentation. Then, the project proposes algorithms to mimic that process. These algorithms are integrated with novel natural language generation systems to create descriptions of software behavior. These descriptions are then integrated into documentation of the source code. A key broader impact of this project is to increase the workforce participation of persons with visual disabilities. First, the descriptions generated by the research can be used in accessibility technologies for blind programmers, to help those programmers read source code. Second, an outreach program to state K-12 schools for the blind and visually impaired helps prepare students in these schools prepare for a career in the software development industry.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 25)
Armaly, A., Klaczynski, J., McMillan, C. "A Case Study of Automated Feature Location Techniques for Industrial Cost Estimation" Proc. of the 32nd IEEE International Conference on Software Maintenance and Evolution, Industry Track , 2016
Armaly, A., McMillan, C. "Source Code Reuse via Execution Record and Replay" Journal of Software: Evolution and Process , 2016
Armaly, A., Rodeghero, P., McMillan, C. "AudioHighlight: Code Skimming for Blind Programmers" Proc. of the 34th IEEE International Conference on Software Maintenance and Evolution , 2018
Armaly, A., Rodeghero, P., McMillan, C. "Blindness and Program Comprehension" Transactions on Software Engineering , 2017
Cruz, B., Jayaraman, B., Dwarakanath, A., and McMillan, C. "Detecting Vague Words & Phrases in Requirements Documents in a Multilingual Environment" Proc. of the 25th International Requirements Engineering Conference , 2017
Cruz, B., Jayaraman, B., Dwarakanath, A., and McMillan, C. "Detecting Vague Words & Phrases in Requirements Documents in a Multilingual Environment" Proc. of the 25th International Requirements Engineering Conference , 2017
Eberhart, Z.; LeClair, A.; McMillan, C. "Automatically Extracting Subroutine Summary Descriptions from Unstructured Comments" 27th IEEE International Conference on Software Analysis, Evolution and Reengineering , 2020
Haque, S. and LeClair, A. and Wu, L. and McMillan, C. "Improved Automatic Summarization of Subroutines via Attention to File Context" Proc. of the 17th International Conference on Mining Software Repositories , 2020 Citation Details
Haque, S.; LeClair, A.; Wu, L.; McMillan, C. "Improved Automatic Summarization of Subroutines via Attention to File Context" 17th International Conference on Mining Software Repositories , 2020
Jiang, S., Armaly, A., McMillan, C. "Automatically Generating Commit Messages from Diffs Using Neural Machine Translation" Proc. of the 32nd IEEE/ACM International Conference on Automated Software Engineering , 2017
Jiang, S., Armaly, A., McMillan, C., Zhi, Q., Metoyer, R. "Docio: Documenting API Input/Output Examples" Proc. of the 25th IEEE International Conference on Program Comprehension, Tool Demo Track , 2017
(Showing: 1 - 10 of 25)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This research project targets the problem of automatic documentation generation for software.  Programmers are notorious for lacking time and resources to write good software documentation for others, even while seeking high-quality software documentation for themselves.  The dilemma is essentially that programmers are under intense time pressure during development, and often cannot devote energy to documentation.  But then later, other programmers struggle to read their code because the documentation is sparse or out of date.  For decades, a dream of software engineering research has been to design algorithms that write this documentation automatically.  This proposal targets four research questions towards this long-term dream:

RQ1: How do programmers read source code when creating documentation?  This question targets the physical process that programmers follow, such as eye movements, keyboard/mouse cursor strikes, and stress indicators when writing documentation.  The purpose for studying this information is to help understand what programmers do in order to automate the rote portions for them.

RQ2: What information from source code do programmers prioritize for documentation?  The purpose of this research question is to determine what information that programmers tend to include in documentation, after they have obtained an understanding of the source code.  Programmers who write documentation may obtain this understanding after reading code from someone else or by writing the code themselves.  In either case, they decide which information is important enough for others programmers to know.  Automated documentation tools would benefit by including this information.

RQ3: How can solutions from text summarization technologies be adapted to solve code summarization research problems?  The rationale behind this research question is that several effective techniques have been proposed in text summarization for over two decades, but are difficult to adapt to code summarization ? code and text communicate information directly.  However, by mimicking the process followed by humans in writing documentation, which we study in RQ1 and RQ2, it is possible to adapt text summarization to source code more effectively than current approaches.  We have published several research papers demonstrating how to adapt text summarization to code summarization, which have helped establish code summarization as a research area at the intersection of software engineering and natural language processing.

RQ4: Do code summarization technologies assist blind programmers in comprehending code as quickly as sighted programmers?  The intent of this question is to blend this proposal?s intellectual merit and broader impacts.  Through this project, a collaborative teaching program between the University of Notre Dame and the Illinois School for the Blind as flourished.  Impact is demonstrated through several research papers.  A landmark paper funded by this project demonstrated zero difference in the quality of code comprehension between sighted and blind programmers, which has strong implications for employment and education of persons who are blind and low vision in computer science.


Last Modified: 09/01/2020
Modified by: Collin Mcmillan

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page