Fault-tolerant Definition & Meaning

Fault-tolerant Definition & Meaning

Fault tolerance is a required design specification for computer equipment used inonline transaction processingsystems, such as airline flight control and reservations systems. Fault-tolerant systems are also widely used in sectors such as distribution and logistics, electric power plants, heavy manufacturing,industrial control systemsand retailing. In a software implementation, the operating system provides an interface that allows a programmer tocheckpointcritical data at predetermined points within a transaction. In a hardware implementation , the programmer does not need to be aware of the fault-tolerant capabilities of the machine. Fault-tolerant servers use a minimal amount of system overhead to achieve high availability with an optimal level of performance.

fault tolerance definition

Recovery block technique can only be used where the task deadlines are more than task computation time. Three redundant copies of critical components are generated and all these three copies are run concurrently. Voting of result of all redundant copies are done and majority result is selected.

Fault Tolerance Is Also Mentioned In

Reporters take notice and write about your mistake all over the United States. Fixing any kind of IT problem requires investigation and savvy. Fault tolerance ensures people can keep working while you hunt down the source. From professional services to documentation, all via the latest industry blogs, we’ve got you covered.

However, if the consequences of a system failure are catastrophic, or the cost of making it sufficiently reliable is very high, a better solution may be to use some form of duplication. In any case, if the consequence of a system failure is so catastrophic, the system must be able to use reversion to fall back to a safe mode. This is similar to roll-back recovery but can be a human action if humans are present in the loop.

Words Near Fault Tolerance in the Dictionary

Conversely, a fault-tolerant cluster consists of multiple physical systems that share a single copy of a computer’s OS. Software commands issued by one system are also executed on the other system. Load balancing solutions remove single points of failure, enabling applications to run on multiple network nodes.

  • It will typically be part of the operating system’s interface, which enables programmers to check the performance of data throughout a transaction.
  • There are ways to make light bulbs with a lot of fault tolerance, but no one does.
  • Fault-tolerant software may be able to run on servers you already have in place that meet industry standards.
  • Fault tolerance is an essential concept in system design and plays a critical role in ensuring the reliability, availability, and performance of systems.
  • Although it’s possible for a system to 100% fault intolerant due to single points of failure , where one failure pulls down the entire process, other systems can have varying degrees of tolerance.

By doing so, this approach aims to do fault detection at the early development stage so that things don’t become complicated later. While this approach ensures that there is always a back available always, it demands tons of effort and resources. At times, it can be too time and cost-consuming as well as it’s not easy to create multiple (‘N’) versions of software. People sometimes confuse fault tolerance with high availability. A company’s high-availability score refers to how often the system stays up when compared to overall run times. To maintain high availability, a system switches to another system when something fails.

Benefits of Implementing Fault Tolerance System

Ivan is proficient in programming languages such as Python, Java, and C++, and has a deep understanding of security frameworks, technologies, and product management methodologies. Fault tolerance is the property of a system that enables it to continue operating properly after a failure occurred in the system. Another example of hardware Fault Tolerance systems is backup servers.

What is Fault Tolerance? – Definition from Techopedia – Techopedia

What is Fault Tolerance? – Definition from Techopedia.

Posted: Thu, 06 Dec 2018 08:00:00 GMT [source]

Fault tolerance is necessary in systems that are used to protect people’s safety , and in systems which security, data protection and integrity, and high value transactions depend on. Fault tolerance is particularly successful in computer applications. Tandem Computer has meaning of fault tolerance built such a machine for its entire business. It used a single-point tolerance to create their NonStop system with uptimes measured in years. is a leading authority on technology, delivering lab-based, independent reviews of the latest products and services.


Fault tolerance software may be part of the OS interface, allowing the programmer to check critical data at specific points during a transaction. Two replicated elements operate in lockstep as a pair, with a voting circuit that detects any mismatch between their operations and outputs a signal indicating that there is an error. A final circuit selects the output of the pair that does not proclaim that it is in error. Pair-and-spare requires four replicas rather than the three of TMR, but has been used commercially. An example of graceful degradation by design in an image with transparency. Each of the top two images is the result of viewing the composite image in a viewer that recognises transparency.

fault tolerance definition

Fault tolerance relies on power supply backups, as well as hardware or software that can detect failures and instantly switch to redundant components. Fault tolerance is an essential concept in system design and plays a critical role in ensuring the reliability, availability, and performance of systems. By understanding and implementing fault tolerance principles, organizations can build mission-critical systems that can withstand failures and ensure the continuous delivery of services. In a high-availabilitycluster, sets of independent servers are loosely coupled together to guarantee system-wide sharing of critical data and resources. The clusters monitor each other’s health and provide fault recovery to ensure applications remain available.

High Availability vs Fault Tolerance

The term “fault tolerance” applies equally to hardware as well as software. Fault-tolerant systems require organizations to have multiple versions of system components to ensure redundancy, extra equipment like backup generators, and additional hardware. An operating system that offers a solid definition for faults cannot be disrupted by a single point of failure. It ensures business continuity and the high availability of crucial applications and systems regardless of any failures. Hardware systems with the same or equivalent backup operating system.

fault tolerance definition

One approach is to run devices on an uninterruptible power supply . Another is to use backup power generators that ensure storage and hardware, heating, ventilation, and air conditioning continue to operate as normal if the primary power source fails. Software systems can be made fault-tolerant by backing them up with other software. A common example is backing up a database that contains customer data to ensure it can continuously replicate onto another machine. As a result, in the event that a primary database fails, normal operations will continue because they are automatically replicated and redirected onto the backup database.

Fault tolerance vs. high availability

A twin-engine airplane is a fault tolerant system – if one engine fails, the other one kicks in, allowing the plane to continue flying. A flat tire will cause the car to stop, but downtime is minimal because the tire can be easily replaced. By far the biggest disadvantage of fault tolerance is that it leads to the building of systems which are far more costly than fault intolerant systems. That is because, among other reasons, they usually require multiple versions of the same components to provide redundancy. There is often some confusion between the concepts of high availability vs fault tolerance. At the most basic level, high availability refers to systems that suffer minimal service interruptions, whilst systems with fault tolerance are designed never to experience service interruptions.

No Comments

Sorry, the comment form is closed at this time.