Execution-Guided Learning for Software Development, Testing, and Maintainance

Talk by Pengyu NIE

January 17, 2023 Tuesday


Machine Learning (ML) techniques have been increasing adopted for Software Engineering (SE) tasks, such as code completion and code summarization. However, existing ML models provide limited value for SE tasks, because these models do not take into account the key characteristics of software: software is executable and software constantly evolves. In this talk, I will present my insights and work on developing execution-guided and evolution-aware ML models for several SE tasks targeting important domains, including software testing, verification, and maintenance.

First, I will present my techniques to help developers write tests and formal proofs. My work has direct impact on software correctness and everyone that depends on software. I will present TeCo: the first ML model for test completion/generation, and Roosterize: the first model for lemma name generation. In order to achieve good performance, these two tasks require reasoning about code execution, which existing ML models are not capable of. To tackle this problem, I designed and develop ML models that integrate execution data and use such data to validate generation results.

Next, I will present my techniques to help developers maintain software. Specifically, I will present my work on comment updating, i.e., automatically updating comments when associated code changes. I proposed the first edit ML model for SE to solve this task, which learns to perform developer-like edits instead of generating comments from scratch. I will also describe the way I generalized this model for general-purpose software editing, including tasks such as bug fixing and automated code review.

All my code and data are open-sourced, evaluated on real-world software, and shown to outperform existing ML models by large margins. My contributions lay the foundation for the development of accurate, robust, and interpretable ML models for SE.

Speaker Bio:

Mr. Pengyu NIE

Ph.D. candidate, The University of Texas at Austin

Pengyu Nie is a Ph.D. candidate at The University of Texas at Austin, advised by Milos Gligoric. Pengyu’s research area is the fusion of Software Engineering (SE) and Natural Language Processing (NLP), with a focus on improving developers’ productivity during software development, testing, and maintenance. He has published 12 papers in top-tier SE, NLP, and PL conferences. He is the recipient of an ACM SIGSOFT Distinguished Paper Award (FSE 2019), and the UT Austin Graduate School Continuing Fellowship. More information can be found on his webpage: https://pengyunie.github.io