Fault Tolerant Computing

(3-0-0-3)

CMPE Degree: This course is Not Applicable for the CMPE degree.

EE Degree: This course is Not Applicable for the EE degree.

Lab Hours: 0 supervised lab hours and 0 unsupervised lab hours.

Technical Interest Group(s) / Course Type(s): VLSI Systems and Digital Design

Course Coordinator:

Prerequisites: ECE 6100

Catalog Description

Key concepts in fault-tolerant computing. Understanding and use of modern
fault-tolerant hardware and software design practices. Case studies.

Course Outcomes

Not Applicable

Strategic Performance Indicators (SPIs)

Not Applicable

Topical Outline

Goals and Applications of Fault Tolerant Computing
Reliability, Availability, Safety, Dependability, etc.
Long Life, Critical Computation
High Availability Applications
Fault Tolerance as a Design Objective

Fault Models
Faults, Errors, and Failures
Causes and Characteristics of Faults
Logical and Physical Faults
Error Models

Fault Tolerant Design Techniques Based on Hardware Redundancy
Hardware Redundancy
TMR, N-modular Redundancy
Voting Methods
Duplication, Standby Sparing
Watchdog Timers
Hybrid Hardware Redundancy
N-modular Redundancy with Spares
Sift-out Modular Redundancy
Triple-duplex Architecture
Fault Tolerant Interconnection Networks

Fault Tolerant Design Techniques Based on Information Redundancy
Parity, M-of-N, Duplication Codes
Checksums, Cyclic Codes, Arithmetic Codes
Berger Codes, Hamming Error Correcting Codes
Code Selection Issues
Time Redundancy, Recomputing with Shifted Operands (RESO)
Software Redundancy, Checks and N-version Programming

Reliability Evaluation Techniques
Failure Rate, Mean Time to Repair, Mean Time Between Failure

Reliability Modeling, Fault Coverage
M-of-N Systems
Markov Models
Safety, Maintainability, Availability

Fault Tolerance in VLSI Circuits
Failure Models in VLSI
Redundancy Techniques in VLSI
Self-checking Logic
Reconfiguration Array Structures
Effect on Yield

Case Studies
FTSC, FTBBC
Space Shuttle
Tandem 16 Non Stop System
Stratus/32 System
ESS

This course will involve writing of a term paper by the students on
research/literature review/design in the fault tolerant computing area. The
topics will be chosen in consultation with the instructor.