Fault-Tolerant Systems
General Description
Computers and networks are increasingly used in critical applications, where system failures can be expensive or even catastrophic. Example applications include aircraft fly-by-wire control, automobile control, computers used in medical systems, spacecraft, and databases in a large variety of financial and enterprise applications. The overall reliability expected of a computer system in these applications far exceeds that of any individual computer. This course is about how to build a highly reliable system that continue to function acceptably even after a number of its components (hardware or software) have failed
Main Topics
Slides: HWFT Part 2
Slides: HWFT Part 3
Slides: Networks Part 1
Slides: Networks Part 2
Slides: Networks Part 3
Slides: Data Replication
Slides: Checkpointing Part 1
Slides: Checkpointing Part 2
Slides: Checkpointing Part 3
Slides: Coding
Slides: Coding Part 2
Slides: Software Fault Tolerance Part 1
Slides: Software Fault Tolerance Part 2
Byzantine Generals Algorithm
Slides: Byzantine Generals algorithm
Computers and networks are increasingly used in critical applications, where system failures can be expensive or even catastrophic. Example applications include aircraft fly-by-wire control, automobile control, computers used in medical systems, spacecraft, and databases in a large variety of financial and enterprise applications. The overall reliability expected of a computer system in these applications far exceeds that of any individual computer. This course is about how to build a highly reliable system that continue to function acceptably even after a number of its components (hardware or software) have failed
- Introduction to fault tolerance.
- Measures of fault-tolerance.
- Exploiting and managing redundancy in:
- Hardware.
- Software.
- Time.
- Data.
- Network fault tolerance.
- Issues in distributed systems.
- Byzantine generals algorithm.
- Fault-tolerant clock synchronization.
- Reliable remote procedure calls.
- Reliability evaluation techniques.
Slides: HWFT Part 2
Slides: HWFT Part 3
Slides: Networks Part 1
Slides: Networks Part 2
Slides: Networks Part 3
Slides: Data Replication
Slides: Checkpointing Part 1
Slides: Checkpointing Part 2
Slides: Checkpointing Part 3
Slides: Coding
Slides: Coding Part 2
Slides: Software Fault Tolerance Part 1
Slides: Software Fault Tolerance Part 2
Byzantine Generals Algorithm
Slides: Byzantine Generals algorithm