Browsing by Author "Froyd, Nathan"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item A Sample-Driven Call Stack Profiler(2004-07-15) Fowler, Rob; Froyd, Nathan; Mellor-Crummey, JohnCall graph profiling reports measurements of resource utilization along with information about the calling context in which the resources were consumed. We present the design of a novel profiler that measures resource utilization and its associated calling context using a stack sampling technique. Our scheme has a novel combination of features and mechanisms. First, it requires no compiler support or instrumentation, either of source or binary code. Second, it works on heavily optimized code and on complex, multi-module applications. Third, it uses sampling rather than tracing to build a context tree, collect histogram data, and to characterize calling patterns. Fourth, the data structures and algorithms are efficient enough to construct the complete tree exposed in the sampling process. We describe an implementation for the Alpha/Tru64 platform and present experimental measurements that compare this implementation with the standard call graph profiler provided on Tru64, hi prof. We show results from a variety of programs in several languages indicating that our profiler operates with modest overhead. Our experiments show that the profiling overhead of our technique is nearly a factor of 55 lower than that of hi prof when profiling a call-intensive recursive program.Item Efficient call path profiles on unmodified, optimized code(2005) Froyd, Nathan; Mellor-Crummey, JohnIdentifying performance bottlenecks and their associated calling contexts is critical for tuning high-performance applications. This thesis presents a new approach to measuring resource utilization and its calling context. Previous instrumentation-based approaches for reporting calling context introduce overhead proportional to the number of function calls performed. We describe a new design for a call path profiler based on stack sampling. Our design enables profiling of unmodified binaries, provides low and controllable overhead, and accurately attributes context-dependent costs of calls. We use a special trampoline function that improves the efficiency of stack sampling and enables the association of unique invocation counts with sampled call sites. We evaluate a Tru64/Alpha implementation of our design and show that on call-intensive codes, the overhead of our approach is over two orders of magnitude lower than the overhead of an instrumentation-based approach, with comparable overhead on other codes.