Scalability and Data Placement on SGI Origin

Date
1998-04-01
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Cache-coherent non-uniform memory access (ccNUMA) architectures have attracted lots of academic and industry interests as a promising direction to large scale parallel computing. Data placement has been used as a major optimization method on such machines. This study examined the scalability and the effect of data placement on a state-of-the-art ccNUMA machine, SGI Origin, using 16 processors. Three applications from SPLASH-2 are used, FFT, Radix and Barnes-Hut. The results showed that FFT and Radix cannot scale to 16 processors with 70% efficiency even for the largest data sizes tested. Barnes-Hut doesn't scale for small data size but scales linearly for large input size. The results also showed that data placement does not make any difference on performance for all three applications. We attribute these results to the effect of the advanced uni-processor used on the Origin, R10K, the optimizing compiler, and the aggressive communication architecture. Some of our results are quite different from the predictions of two recent simulation studies on directory-based ccNUMA machines (Holt:ISCA96) and (Pai:HPCA97), especially on FFT. These differences are partly due to the fact that the machine models used in previous simulation studies are different from the Origin machine in some important aspects. Our results also include data sizes that are larger than any of the previous simulation studies. To increase our confidence on the latency numbers and data placement tools, we also measured memory latencies using micro-benchmarks.

Description
Advisor
Degree
Type
Technical report
Keywords
Citation

Chauhan, Arun, Ding, Chen and Sheraw, Berry. "Scalability and Data Placement on SGI Origin." (1998) https://hdl.handle.net/1911/96488.

Has part(s)
Forms part of
Published Version
Rights
You are granted permission for the noncommercial reproduction, distribution, display, and performance of this technical report in any format, but this permission is only for a period of forty-five (45) days from the most recent time that you verified that this technical report is still available from the Computer Science Department of Rice University under terms that include this permission. All other rights are reserved by the author(s).
Link to license
Citable link to this page