Role of Context in Program Search and Synthesis
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Consider the case where a programmer has written some part of a program, but has left part of the program (such as a method or a function body) incomplete. The goal is to use the context surrounding the missing code to automatically “figure out” the programmer`s intent and suggest relevant programs back. The problem is “contextualized” in the sense that the helper engine should use clues in the partially-completed program to figure out which code is most useful. The user should not be required to formulate an explicit query. To achieve this goal, I propose two approaches. The first approach searches for relevant programs from a database of codes and the second directly synthesizes the desired code, by writing them automatically.
In the first part of the thesis, I consider the problem of querying a database of open-source codes, and the task is quickly inferring which of the codes in the database would be useful to the programmer, in order to help complete the missing method. I cast contextualized code search as a learning problem, where the goal is to learn a distribution function that computes the likelihood that each database code correctly completes the program. I propose a neural model for predicting which database code is likely to be most useful. Because it will be prohibitively expensive to apply a neural model to each code in a database of millions or billions of codes at search time, one of the technical concerns is ensuring a speedy search. I address this by learning a “reverse encoder” that can be used to reduce the problem of evaluating each database code to computing a convolution of two normal distributions.
In the second part of the thesis, I try to directly synthesize the most appropriate program for the user, according to the program context, while following the semantics of a programming language. Direct synthesis ensures that the system can come up with a reasonable answer to a query, even when the desired code does not exist in the database. My technical innovation in this work is to augment the grammar of the programming language with semantic annotations, to guide neural model-driven synthesis. In my work, these annotations are produced by a Java compiler. The formalism I use to add such annotations is a so-called “attribute grammar”. This method alleviates many of the problems associated with learning to synthesize programs having long-term semantic dependencies across many lines of code, by minimizing the amount of information that needs to be remembered by the neural network controlling the synthesis. Synthesizing the correct program in a particular context then reduces to finding the sequence of production rules in the attribute grammar. The resulting neural synthesizer, guided by the Java compiler, produces programs that are much more likely to be semantically correct than programs generated without the aide of an attribute grammar.
Description
Advisor
Degree
Type
Keywords
Citation
Mukherjee, Rohan. "Role of Context in Program Search and Synthesis." (2021) Diss., Rice University. https://hdl.handle.net/1911/110255.