Generative Language Models for Program Synthesis and Evaluation

Jiang, Mingchao

Generative Language Models for Program Synthesis and Evaluation

dc.contributor.advisor	Jermaine, Christopher M.	en_US
dc.creator	Jiang, Mingchao	en_US
dc.date.accessioned	2025-01-16T19:31:20Z	en_US
dc.date.available	2025-01-16T19:31:20Z	en_US
dc.date.created	2024-12	en_US
dc.date.issued	2024-12-06	en_US
dc.date.submitted	December 2024	en_US
dc.date.updated	2025-01-16T19:31:20Z	en_US
dc.description.abstract	Recent advances in Large Language Models (LLMs), such as GPT and Claude, have significantly advanced the field of program synthesis. To evaluate the performance of these models, traditional benchmarks like APPS, MBPP, and HumanEval reveal limitations due to potential data leakage and their inability to mirror the complexity of real-world programming. These benchmarks typically feature concise, stand-alone code samples that fail to assess the nuanced capabilities required for comprehensive coding tasks adequately. To address these limitations, this dissertation introduces a novel, private benchmark dataset - SimCoPilot, specifically crafted to simulate the ability of an AI such as a large language model (LLM) to perform as a “copilot”-style, interactive coding assistant. In SimCoPilot, an AI is asked to provide small amounts of code within an existing project, ranging in size from hundreds to thousands of lines. The benchmark tests an AI’s ability to write code in both completion (providing code to finish a method or a block) and infill scenarios (providing code to fill a blank in a method), covering various domains such as classic algorithms, databases, computer vision, and neural networks. Despite their varied architectures, most LLMs typically treat source code as mere string objects and require large-scale models and extensive training datasets. Unlike natural language, however, source code is a formal language imbued with rich syntactical and semantic structures. Addressing this disparity, this dissertation explored an innovative approach that explicitly extracts and integrates these syntactic and semantic elements into an encoder-decoder transformer model. Our detailed evaluation analyzes how LLMs manage different code dependencies and logic complexities, providing insights into their operational effectiveness in realistic programming environments. This examination provides profound insights into the capabilities of modern Language Models in navigating realistic programming challenges, thereby making a significant contribution to the understanding of their practical applicability in the software development environment.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.uri	https://hdl.handle.net/1911/118163	en_US
dc.language.iso	en	en_US
dc.subject	Program Synthesis	en_US
dc.subject	LLM	en_US
dc.subject	GenAI	en_US
dc.subject	Program Evaluation	en_US
dc.title	Generative Language Models for Program Synthesis and Evaluation	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Computer Science	en_US
thesis.degree.discipline	Computer Science	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: JIANG-DOCUMENT-2024.pdf
Size:: 28.84 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 5.84 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 2.98 KB
Format:: Plain Text
Description:

Download

Collections

Rice University Theses and Dissertations