Role of Context in Program Search and Synthesis

dc.contributor.advisorJermaine, Christopheren_US
dc.creatorMukherjee, Rohanen_US
dc.date.accessioned2021-04-13T21:52:57Zen_US
dc.date.available2021-06-01T05:01:14Zen_US
dc.date.created2020-12en_US
dc.date.issued2021-03-01en_US
dc.date.submittedDecember 2020en_US
dc.date.updated2021-04-13T21:52:57Zen_US
dc.description.abstractConsider the case where a programmer has written some part of a program, but has left part of the program (such as a method or a function body) incomplete. The goal is to use the context surrounding the missing code to automatically “figure out” the programmer`s intent and suggest relevant programs back. The problem is “contextualized” in the sense that the helper engine should use clues in the partially-completed program to figure out which code is most useful. The user should not be required to formulate an explicit query. To achieve this goal, I propose two approaches. The first approach searches for relevant programs from a database of codes and the second directly synthesizes the desired code, by writing them automatically. In the first part of the thesis, I consider the problem of querying a database of open-source codes, and the task is quickly inferring which of the codes in the database would be useful to the programmer, in order to help complete the missing method. I cast contextualized code search as a learning problem, where the goal is to learn a distribution function that computes the likelihood that each database code correctly completes the program. I propose a neural model for predicting which database code is likely to be most useful. Because it will be prohibitively expensive to apply a neural model to each code in a database of millions or billions of codes at search time, one of the technical concerns is ensuring a speedy search. I address this by learning a “reverse encoder” that can be used to reduce the problem of evaluating each database code to computing a convolution of two normal distributions. In the second part of the thesis, I try to directly synthesize the most appropriate program for the user, according to the program context, while following the semantics of a programming language. Direct synthesis ensures that the system can come up with a reasonable answer to a query, even when the desired code does not exist in the database. My technical innovation in this work is to augment the grammar of the programming language with semantic annotations, to guide neural model-driven synthesis. In my work, these annotations are produced by a Java compiler. The formalism I use to add such annotations is a so-called “attribute grammar”. This method alleviates many of the problems associated with learning to synthesize programs having long-term semantic dependencies across many lines of code, by minimizing the amount of information that needs to be remembered by the neural network controlling the synthesis. Synthesizing the correct program in a particular context then reduces to finding the sequence of production rules in the attribute grammar. The resulting neural synthesizer, guided by the Java compiler, produces programs that are much more likely to be semantically correct than programs generated without the aide of an attribute grammar.en_US
dc.embargo.terms2021-06-01en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationMukherjee, Rohan. "Role of Context in Program Search and Synthesis." (2021) Diss., Rice University. <a href="https://hdl.handle.net/1911/110255">https://hdl.handle.net/1911/110255</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/110255en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectMachine Learningen_US
dc.subjectDeep Learningen_US
dc.subjectSoftware Engineeringen_US
dc.subjectProgram Synthesisen_US
dc.subjectProgram Searchen_US
dc.subjectCode Searchen_US
dc.subjectJavaen_US
dc.subjectInformation Retrievalen_US
dc.titleRole of Context in Program Search and Synthesisen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MUKHERJEE-DOCUMENT-2020.pdf
Size:
2.19 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: