Browsing by Author "Mukherjee, Rohan"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Mining Natural APIs from Large Code Corpora using a Mixture of Hidden Markov Models(2017-10-26) Mukherjee, Rohan; Jermaine, ChristopherA Natural API is a collection of API methods that tend to be used following certain discernible statistical patterns in real-world code. In this thesis, I present a method for learning an interpretable statistical model for such natural APIs. My model is trained on sequences of API calls produced from large software repositories through program analysis. Once trained, the model is able to recognize complex temporal dependences between methods, including methods that technically belong to different APIs, and can be used as a proxy for formal correctness specifications. Our experiments train the model on sequences of method calls generated from over 150 million lines of Android code. We evaluate the learned model by measuring accuracy in learnt specifications from the corpus, completing code with missing API calls, and searching for code that uses APIs in a way that matches a query. Our encouraging results indicate that statistical models of API calls learned from large code corpora can have broad value in software engineering.Item Role of Context in Program Search and Synthesis(2021-03-01) Mukherjee, Rohan; Jermaine, ChristopherConsider the case where a programmer has written some part of a program, but has left part of the program (such as a method or a function body) incomplete. The goal is to use the context surrounding the missing code to automatically “figure out” the programmer`s intent and suggest relevant programs back. The problem is “contextualized” in the sense that the helper engine should use clues in the partially-completed program to figure out which code is most useful. The user should not be required to formulate an explicit query. To achieve this goal, I propose two approaches. The first approach searches for relevant programs from a database of codes and the second directly synthesizes the desired code, by writing them automatically. In the first part of the thesis, I consider the problem of querying a database of open-source codes, and the task is quickly inferring which of the codes in the database would be useful to the programmer, in order to help complete the missing method. I cast contextualized code search as a learning problem, where the goal is to learn a distribution function that computes the likelihood that each database code correctly completes the program. I propose a neural model for predicting which database code is likely to be most useful. Because it will be prohibitively expensive to apply a neural model to each code in a database of millions or billions of codes at search time, one of the technical concerns is ensuring a speedy search. I address this by learning a “reverse encoder” that can be used to reduce the problem of evaluating each database code to computing a convolution of two normal distributions. In the second part of the thesis, I try to directly synthesize the most appropriate program for the user, according to the program context, while following the semantics of a programming language. Direct synthesis ensures that the system can come up with a reasonable answer to a query, even when the desired code does not exist in the database. My technical innovation in this work is to augment the grammar of the programming language with semantic annotations, to guide neural model-driven synthesis. In my work, these annotations are produced by a Java compiler. The formalism I use to add such annotations is a so-called “attribute grammar”. This method alleviates many of the problems associated with learning to synthesize programs having long-term semantic dependencies across many lines of code, by minimizing the amount of information that needs to be remembered by the neural network controlling the synthesis. Synthesizing the correct program in a particular context then reduces to finding the sequence of production rules in the attribute grammar. The resulting neural synthesizer, guided by the Java compiler, produces programs that are much more likely to be semantically correct than programs generated without the aide of an attribute grammar.