Performance Measurement, Analysis, and Optimization of GPU-accelerated Applications

dc.contributor.advisorMellor-Crummey, John
dc.creatorZhou, Keren
dc.date.accessioned2022-10-05T20:47:09Z
dc.date.available2022-10-05T20:47:09Z
dc.date.created2022-05
dc.date.issued2022-04-26
dc.date.submittedMay 2022
dc.date.updated2022-10-05T20:47:10Z
dc.description.abstractWith the end of Moore’s law, computing platforms are increasingly exploring heterogeneous processors for acceleration. Graphics Processing Units (GPUs) have emerged as a key component for accelerating applications in various domains, including deep learning, data analytics, and scientific simulations. While GPUs provide superior compute power and higher memory bandwidth than CPUs, writing efficient GPU code to achieve maximum possible performance is challenging because of the sophisticated programming models and architectural features. GPU performance tools are designed to pinpoint performance bottlenecks in GPU-accelerated applications and provide performance insights for users. However, existing performance tools are insufficient to identify hotspots and provide insights for complex applications. This thesis describes novel GPU performance tools that measure and analyze GPU-accelerated applications to address these challenges. First, I describe a GPU profiler that uses API interception, instruction sampling, and binary instrumentation to collect GPU performance metrics. To lower the overhead caused by the profiler, I designed novel wait-free queues for communication between multiple threads, a GPU-accelerated method to process measurement data, and metrics derivation method that derives multiple essential GPU performance metrics without replaying GPU operations. Then, I present a framework that attributes measurement data collected at runtime to call paths with low overhead. Offline, I developed a binary analyzer that reconstructs approximate GPU calling contexts by analyzing instruction samples and GPU binaries. Also, the analyzer analyzes def-use relations among GPU instructions to attribute instruction stalls to their root causes and identify the value type of memory instructions. Using performance metrics, program contexts, and instruction characteristics, I developed context-sensitive, instruction stall, and value redundancy analyzers to generate insightful performance reports. The context-sensitive analyzer focuses users' attention on hotspots with sophisticated program contexts. The instruction stall analyzer matches performance bottlenecks with potential optimizations, estimates speedups for each optimization, and outputs the optimization suggestions with the highest estimated speedups. The value redundancy analyzer identifies GPU operations involving significantly redundant values and constructs a value flow graph to visualize value changes across GPU operations. To demonstrate the effectiveness of our performance tools, I have studied many machine learning and HPC applications. Guided by the insightful performance reports generated by our tools, I have identified performance hotspots and proposed effective optimizations that ameliorate underlying causes for inefficiency.
dc.format.mimetypeapplication/pdf
dc.identifier.citationZhou, Keren. "Performance Measurement, Analysis, and Optimization of GPU-accelerated Applications." (2022) Diss., Rice University. <a href="https://hdl.handle.net/1911/113506">https://hdl.handle.net/1911/113506</a>.
dc.identifier.urihttps://hdl.handle.net/1911/113506
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectGraphics Processing Units
dc.subjectPerformance Tools
dc.subjectPerformance Measurement
dc.subjectInstrumentation
dc.subjectInstruction Sampling
dc.subjectDeep Learning
dc.subjectHigh Performance Computing
dc.subjectPerformance Tuning
dc.titlePerformance Measurement, Analysis, and Optimization of GPU-accelerated Applications
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ZHOU-DOCUMENT-2022.pdf
Size:
11.12 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: