Role of Context in Program Search and Synthesis

Mukherjee, Rohan

Role of Context in Program Search and Synthesis

Files

MUKHERJEE-DOCUMENT-2020.pdf (2.19 MB)

Date

2021-03-01

Authors

Mukherjee, Rohan

Abstract

Consider the case where a programmer has written some part of a program, but has left part of the program (such as a method or a function body) incomplete. The goal is to use the context surrounding the missing code to automatically “figure out” the programmer`s intent and suggest relevant programs back. The problem is “contextualized” in the sense that the helper engine should use clues in the partially-completed program to figure out which code is most useful. The user should not be required to formulate an explicit query. To achieve this goal, I propose two approaches. The first approach searches for relevant programs from a database of codes and the second directly synthesizes the desired code, by writing them automatically.

In the first part of the thesis, I consider the problem of querying a database of open-source codes, and the task is quickly inferring which of the codes in the database would be useful to the programmer, in order to help complete the missing method. I cast contextualized code search as a learning problem, where the goal is to learn a distribution function that computes the likelihood that each database code correctly completes the program. I propose a neural model for predicting which database code is likely to be most useful. Because it will be prohibitively expensive to apply a neural model to each code in a database of millions or billions of codes at search time, one of the technical concerns is ensuring a speedy search. I address this by learning a “reverse encoder” that can be used to reduce the problem of evaluating each database code to computing a convolution of two normal distributions.

In the second part of the thesis, I try to directly synthesize the most appropriate program for the user, according to the program context, while following the semantics of a programming language. Direct synthesis ensures that the system can come up with a reasonable answer to a query, even when the desired code does not exist in the database. My technical innovation in this work is to augment the grammar of the programming language with semantic annotations, to guide neural model-driven synthesis. In my work, these annotations are produced by a Java compiler. The formalism I use to add such annotations is a so-called “attribute grammar”. This method alleviates many of the problems associated with learning to synthesize programs having long-term semantic dependencies across many lines of code, by minimizing the amount of information that needs to be remembered by the neural network controlling the synthesis. Synthesizing the correct program in a particular context then reduces to finding the sequence of production rules in the attribute grammar. The resulting neural synthesizer, guided by the Java compiler, produces programs that are much more likely to be semantically correct than programs generated without the aide of an attribute grammar.

Advisor

Jermaine, Christopher

Degree

Doctor of Philosophy

Type

Thesis

Keywords

Machine Learning, Deep Learning, Software Engineering, Program Synthesis, Program Search, Code Search, Java, Information Retrieval

Citation

Mukherjee, Rohan. "Role of Context in Program Search and Synthesis." (2021) Diss., Rice University. https://hdl.handle.net/1911/110255.

Rights

Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.

Citable link to this page

https://hdl.handle.net/1911/110255

Collections

Rice University Theses and Dissertations

Full item page