Manetho: Fault tolerance in distributed systems using rollback-recovery and process replication

dc.contributor.advisorZwaenepoel, Willy
dc.creatorElnozahy, Elmootazbellah Nabil
dc.date.accessioned2009-06-04T08:10:05Z
dc.date.available2009-06-04T08:10:05Z
dc.date.issued1994
dc.description.abstractThis dissertation presents a new protocol that allows rollback-recovery and process replication to co-exist in a distributed system. The protocol relies on a novel data structure called the antecedence graph, which tracks the nondeterministic events during failure-free operation and provides information for recreating them if a failure occurs. The rollback-recovery part of the protocol combines the low failure-free overhead of optimistic rollback-recovery with the advantages of pessimistic rollback-recovery, namely fast output commit, limited rollback, and failure-containment. The process replication part of the protocol features a new multicast protocol designed specifically to support process replication. Unlike previous work, the new protocol provides high throughput and low latency in message delivery without relying on the application semantics. The protocol has been implemented in the Manetho prototype. Experience with a number of long-running, compute-intensive parallel applications confirms the performance advantages of the new protocol. The implementation also features several performance optimizations that are applicable to other rollback-recovery and multicast protocols.
dc.format.extent111 p.en_US
dc.format.mimetypeapplication/pdf
dc.identifier.callnoTHESIS COMP.SCI. 1994 ELNOZAHY
dc.identifier.citationElnozahy, Elmootazbellah Nabil. "Manetho: Fault tolerance in distributed systems using rollback-recovery and process replication." (1994) Diss., Rice University. <a href="https://hdl.handle.net/1911/19117">https://hdl.handle.net/1911/19117</a>.
dc.identifier.urihttps://hdl.handle.net/1911/19117
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectComputer science
dc.titleManetho: Fault tolerance in distributed systems using rollback-recovery and process replication
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
9715028.PDF
Size:
4.79 MB
Format:
Adobe Portable Document Format