Exploiting instruction-level parallelism for memory system performance

Pai, Vijay Sadananda

Exploiting instruction-level parallelism for memory system performance

dc.contributor.advisor	Adve, Sarita V.	en_US
dc.creator	Pai, Vijay Sadananda	en_US
dc.date.accessioned	2009-06-04T08:26:38Z	en_US
dc.date.available	2009-06-04T08:26:38Z	en_US
dc.date.issued	2000	en_US
dc.description.abstract	Current microprocessors improve performance by exploiting instruction-level parallelism (ILP). ILP hardware techniques such as multiple instruction issue, out-of-order (dynamic) issue, and non-blocking reads can accelerate both computation and data memory references. Since computation speeds have been improving faster than data memory access times, memory system performance is quickly becoming the primary obstacle to achieving high performance. This dissertation focuses on exploiting ILP techniques to improve memory system performance. This dissertation includes both an analysis of ILP memory system performance and optimizations developed using the insights of this analysis. First, this dissertation shows that ILP hardware techniques, used in isolation, are often unsuccessful at improving memory system performance because they fail to extract parallelism among data reads that miss in the processor's caches. The previously-studied latency-tolerance technique of software prefetching provides some improvement by initiating data read misses earlier, but also suffers from limitations caused by exposed startup latencies, excessive fetch-ahead distances, and references that are hard to prefetch. This dissertation then uses the above insights to develop compile-time software transformations that improve memory system parallelism and performance. These transformations improve the effectiveness of ILP hardware, reducing exposed latency by over 80% for a latency-detection microbenchmark and reducing execution time an average of 25% across 14 multiprocessor and uniprocessor cases studied in simulation and an average of 21% across 12 cases on a real system. These transformations also combine with software prefetching to address key limitations in either latency-tolerance technique alone, providing the best performance when both techniques are combined for most of the uniprocessor and multiprocessor codes that we study. Finally, this dissertation also explores appropriate evaluation methodologies for ILP shared-memory multiprocessors. Memory system parallelism is a key feature determining ILP performance, but is neglected in previous-generation fast simulators. This dissertation highlights the errors possible in such simulators and presents new evaluation methodologies to improve the tradeoff between accuracy and evaluation speed.	en_US
dc.format.extent	139 p.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.callno	THESIS E.E. 2001 PAI	en_US
dc.identifier.citation	Pai, Vijay Sadananda. "Exploiting instruction-level parallelism for memory system performance." (2000) Diss., Rice University. <a href="https://hdl.handle.net/1911/18009">https://hdl.handle.net/1911/18009</a>.	en_US
dc.identifier.uri	https://hdl.handle.net/1911/18009	en_US
dc.language.iso	eng	en_US
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.subject	Electronics	en_US
dc.subject	Electrical engineering	en_US
dc.subject	Computer science	en_US
dc.title	Exploiting instruction-level parallelism for memory system performance	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Electrical Engineering	en_US
thesis.degree.discipline	Engineering	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 3021168.PDF
Size:: 6.48 MB
Format:: Adobe Portable Document Format

Download

Collections

Rice University Theses and Dissertations