Fault Tolerant Computing
(3-0-0-3)
CMPE Degree: This course is Not Applicable for the CMPE degree.
EE Degree: This course is Not Applicable for the EE degree.
Lab Hours: 0 supervised lab hours and 0 unsupervised lab hours.
Technical Interest Group(s) / Course Type(s): VLSI Systems and Digital Design
Course Coordinator:
Prerequisites: ECE 6100
Corequisites: None.
Catalog Description
Key concepts in fault-tolerant computing. Understanding and use of modernfault-tolerant hardware and software design practices. Case studies.
Course Outcomes
Not Applicable
Student Outcomes
In the parentheses for each Student Outcome:"P" for primary indicates the outcome is a major focus of the entire course.
“M” for moderate indicates the outcome is the focus of at least one component of the course, but not majority of course material.
“LN” for “little to none” indicates that the course does not contribute significantly to this outcome.
1. ( Not Applicable ) An ability to identify, formulate, and solve complex engineering problems by applying principles of engineering, science, and mathematics
2. ( Not Applicable ) An ability to apply engineering design to produce solutions that meet specified needs with consideration of public health, safety, and welfare, as well as global, cultural, social, environmental, and economic factors
3. ( Not Applicable ) An ability to communicate effectively with a range of audiences
4. ( Not Applicable ) An ability to recognize ethical and professional responsibilities in engineering situations and make informed judgments, which must consider the impact of engineering solutions in global, economic, environmental, and societal contexts
5. ( Not Applicable ) An ability to function effectively on a team whose members together provide leadership, create a collaborative and inclusive environment, establish goals, plan tasks, and meet objectives
6. ( Not Applicable ) An ability to develop and conduct appropriate experimentation, analyze and interpret data, and use engineering judgment to draw conclusions
7. ( Not Applicable ) An ability to acquire and apply new knowledge as needed, using appropriate learning strategies.
Strategic Performance Indicators (SPIs)
Not Applicable
Course Objectives
Topical Outline
Goals and Applications of Fault Tolerant Computing
Reliability, Availability, Safety, Dependability, etc.
Long Life, Critical Computation
High Availability Applications
Fault Tolerance as a Design Objective
Fault Models
Faults, Errors, and Failures
Causes and Characteristics of Faults
Logical and Physical Faults
Error Models
Fault Tolerant Design Techniques Based on Hardware Redundancy
Hardware Redundancy
TMR, N-modular Redundancy
Voting Methods
Duplication, Standby Sparing
Watchdog Timers
Hybrid Hardware Redundancy
N-modular Redundancy with Spares
Sift-out Modular Redundancy
Triple-duplex Architecture
Fault Tolerant Interconnection Networks
Fault Tolerant Design Techniques Based on Information Redundancy
Parity, M-of-N, Duplication Codes
Checksums, Cyclic Codes, Arithmetic Codes
Berger Codes, Hamming Error Correcting Codes
Code Selection Issues
Time Redundancy, Recomputing with Shifted Operands (RESO)
Software Redundancy, Checks and N-version Programming
Reliability Evaluation Techniques
Failure Rate, Mean Time to Repair, Mean Time Between Failure
Reliability Modeling, Fault Coverage
M-of-N Systems
Markov Models
Safety, Maintainability, Availability
Fault Tolerance in VLSI Circuits
Failure Models in VLSI
Redundancy Techniques in VLSI
Self-checking Logic
Reconfiguration Array Structures
Effect on Yield
Case Studies
FTSC, FTBBC
Space Shuttle
Tandem 16 Non Stop System
Stratus/32 System
ESS
This course will involve writing of a term paper by the students on
research/literature review/design in the fault tolerant computing area. The
topics will be chosen in consultation with the instructor.