R-3 Repository :: Browsing by Author "Zhu, Weixi"

Browsing by Author "Zhu, Weixi"

Now showing 1 - 2 of 2

Exploring Superpage Promotion Policies for Efficient Address Translation
(2019-03-19) Zhu, Weixi; Rixner, Scott
Address translation performance for modern applications depends heavily upon the number of translation entries cached in the hardware TLB (translation look-aside buffer). Therefore, the efficiency of address translation relies directly on the TLB hit rate. The number of TLB entries continues to fall further behind the growth of memory consumption for modern applications. Superpages, which are pages with larger sizes, can increase the efficiency of the TLB by enabling each translation entry to cover a larger memory region. Without requiring more TLB entries, using superpages can increase the TLB hit rate and benefit address translation. However, using superpages can bring overhead. The TLB uses a single dirty bit to mark a page as dirty during address translation before modifying the page, so the granularity of the dirty bit corresponds to the coverage of the translation entry. As a result, the OS (operating system) will pay extra I/O effort when it allocates or writes an underutilized superpage back to disk. Such extra overhead can easily surpass the address translation benefits of superpages. This thesis discusses the performance trade-offs of superpages by exploring the design space of superpage promotion policies in the OS. A data collection infrastructure is built based on QEMU with kernel instrumentation on FreeBSD to collaboratively collect both memory accesses and kernel events. Then, the TLB behavior of Intel Skylake x86 family processors is simulated. The simulation has been validated to be faithful and consistent with the real-world performance. Last, this thesis evaluates and compares both TLB performance benefits and I/O overheads among the superpage promotion policies to discuss the trade-offs in the design space.
Virtual Memory Management for Emerging Accelerators and Large-memory Applications
(2022-12-02) Zhu, Weixi; Rixner, Scott; Cox, Alan L
Today, the operating system (OS) is called upon to support a variety of applications that process large amounts of data using an ever growing collection of specialized hardware accelerators. Nonetheless, current OSes still fail to (1) ease the development of drivers for new accelerators that need access to in-memory data and to (2) provide efficient access to that data by both the CPU and accelerators. Applications need virtual memory abstractions to securely isolate data and hide hardware details of the CPU and accelerators. Currently, OS memory management is designed for managing the CPU's memory and cannot be directly used for many accelerators. However, the absence of better OS memory management support for devices affects driver authors in terms of ease of development. They implement ad-hoc and specialized virtual memory management that reinvents many existing mechanisms from OS memory management. Unfortunately, the large complexity of virtual memory management hinders the implementation of an efficient one, so accelerator users may suffer from bad performance. Furthermore, the continued growth of data set sizes amplifies the performance impact of hardware limitations of both the CPU and accelerators. These limitations can be alleviated independently with innovative optimizations for OS memory management and drivers' ad-hoc memory management. However, this further complicates the difficulties of sharing these innovations. This thesis presents GMEM, generalized memory management, that refactors OS memory management to provide a high-level interface for both the CPU and emerging accelerators to share existing memory management mechanisms and innovative optimizations. For instance, the GMEM-based driver of a simulated device takes less than 100 hardware-independent LoC to provide a similar virtual memory abstraction to that from Nvidia's GPU driver. Additionally, this thesis presents two innovative memory management optimizations for FreeBSD and Nvidia's GPU driver in response to applications' larger and larger memory footprint. For example, its optimization for Nvidia's GPU driver enables a deep learning application to obtain 60% higher training throughput. These two innovations are to be merged with mainstream FreeBSD and Nvidia's GPU driver respectively, but more importantly, they are sharable via GMEM.

Browsing by Author "Zhu, Weixi"

Results Per Page

Sort Options