Engineering Clinic

Posts

Hardware Redundancy

Hardware Redundancy Use of additional hardware to compensate for failures This can be done in two ways Fault detection, correction and Masking. Multiple hardware units may be assigned to do the same task in parallel and their results compared. If one or more units are faulty, we can express this to show up as a disagreement in the results. The second is to replace the malfunctioning units. Redundancy is expensive, duplicating or triplicating the hardware is justified only in most critical applications Two methods of hardware redundancy is given below are, Static Pairing N modular Redundancy (NMR) Static Pairing Hardwire processors in pairs and to discard the entire pair if one of the processors fails, this is very simple scheme The Pairs runs identical software with identical inputs and should generate idientical outputs. If the output is not identical, then the pair is non functional, so the entire pair is discarded This approach is depicted in the following figure, and it will w...

Fault and Error Containment

A Fault in one part of the system cause large voltage swings in the other parts of the system. So it is necessary to prevent from spreading through the system. This is called as containment. This can be divided into Fault Containment Zone (FCZ) and A failure of some part of the computer outside an FCZ cannot cause any element inside that FCZ to fail Hardware inside the FCZ must be isolated from the outside system. Each FCZ should be have independent power supply and its own clock (may be synchronized with the other clocks) Typically, the FCZ consists of a whole computer which includes processors, memory I/O and control interfaces. Error Containment Zone (ECZ) Prevent errors from propagating across zone boundaries. This is achived by means of voting redundant outputs. Hardware Redundancy Software Redundancy Time Redundancy Information Redundancy

Introduction to Fault Tolerance

Fault Tolerance Techniques Introduction Hardware Faults – Occurs due to a physical defect of a system like a broken wire or a logic struck at 0 in a gate. Software faults – occurs due to a bug introduced in a system so the software misbehaves for a given set of inputs Error – the manifestation of a fault is the error (Fault may occur anytime, but only the error manifests that fault) Fault Latency – the time between the onset of fault and its manifestation as an error is the fault latency Error Recovery Forward Error Recovery – the error is masked without any computations having to be redone. Backward Error Recovery - the system is rolled back to a moment in time before the error is believed to have occurred. What Causes Failures? There are three main causes of failures: Errors in the specification or design Mistakes in the specification and Design are very difficult to guard. Many hardware failures and all software failures occur due to such mistakes. It is difficult to ensure that the...

Database for Hard Real time Systems

Databases for real time systems are meant for the use of both hard and soft systems. Since hard real time systems needs strict timing constraints, conventional disk based databases are not suitable, but soft real time systems makes use of disk based systems through FCFS, Elevator or scan policy algorithm. There should be some solution for Hard Real time systems with high performance and guaranteed response time constraints. MDARTS (Multiprocessor Database Architecture for Real Time System) is one such main memory database which uses VME based processors. Features This is for Hard Real time Systems It is a main memory database (the entire database resides on the main memory) Object oriented database (C++ elements) Supports explicit declaration of real time constraints and semantic constraints within the application code. Constraints Specifications Access time ”write The above are the constraints which can be included in the application code directly without the recompilation of the MDAR...

Databases for Soft Real Time System

In disk based scheduling, the disks are located and traced by traversing the sectors and tracks. Tracks are concentric circles and sectors are just originating from the center of the disks. So, disk scheduling algorithms are slower when compared with the memory based scheduling. Under disk based scheduling Ta=Tw+Tp+Tt Ta is the access time Tw is the time spent in queue Tp is the time to position the arm to locate the sectors and track Tt is the time taken to transfer the block of data. Since Tp is in the order of few milliseconds, but CPU time is hardly around 50nano seconds, so disk based scheduling algorithms are not suitable for hard real time systems. However these algorithms, can be made useful for soft real time systems. Disk based scheduling will be useful for Soft Real time system based on the following algorithms First Come First Serve (FCFS) The task or transaction which comes first will be scheduled and then the next, here the main problem is if the requests are huge then t...

Concurrency Control Issues

Pessimistic Concurrency Control The transactions are been checking for violating the serialization consistency before letting it execute is called pessimistic concurrency control Two phase locking Scheme Read /Write lock as a phase Unlock – another phase Both the above phases wont interleave, because of this, there may be deadlock, which can be detected by means of deadlock detection algorithm and if deadlock is there, one of the transactions is aborted with the nearer timestamp. Multiversion Scheme There are three locks Read, Write and Certify Read lock – Read the needed data from the database Write lock – Writing to its own private space Certify lock – Updates to the database, this stage is the committed stage. General locking Rules Lock Already Set Lock Requested Read Write Certify Read Granted Granted Blocked Write Granted Granted Granted Certify Blocked Granted Blocked Locking rules for priority Inversion Lock Already Set by a Low Priority Transaction Lock Requested by a High Pr...

Transaction Abortions

Transaction Abortions Transaction abortion is of two types, either Termination abortion or The Transaction which is aborted in this way won't be restarted Example: An attempt to divide by zero error Non Termination Abortion The transaction which will be restarted after it is being aborted Example: data conflict due to a deadlock, If two transactions are involved in a deadlock, one of the transaction will be aborted and will be restarted