Fault Recovery in Theseus OS
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This work describes the implementation and evaluation of fault recovery in the Theseus Operating System (OS), a new OS developed from scratch. Theseus features a modular structure, a collection of tiny modules that minimize the states they hold for each other. Theseus is implemented using a safe programming language, Rust, and leverages the compiler to ensure type and memory safety to achieve isolation among tasks. Fault recovery is essential in Theseus as a faulty task can potentially corrupt any OS structure, in the absence of hardware provided isolation. We implement a series of fault recovery mechanisms on Theseus that take increasingly drastic measures to recover, if recovery was unsuccessful at the previous stage. At first we fully unwind and restart faulty tasks. If the fault is persistent, we replace potentially corrupted modules by loading fresh copies of those modules from the disk to a different location in memory. We evaluate Theseus’s ability to recover from faults by stress testing our fault recovery implementation in the presence of hardware faults. Furthermore, we show that Theseus can recover from faults occurring in core OS components, e.g., those that necessarily exist within a microkernel, which goes beyond the capabilities of existing works.
Description
Advisor
Degree
Type
Keywords
Citation
Godawatte Liyanage, Namitha. "Fault Recovery in Theseus OS." (2021) Master’s Thesis, Rice University. https://hdl.handle.net/1911/113886.