




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Oct. 20121Part III Faults: Logical DeviationsAbout This PresentationThis presentation is intended to support the use of the textbook Dependable Computing: A Multilevel Approach (traditional print or on-line open publication, TBD). It is updated regularly by the author as part of his teaching of the
2、graduate course ECE 257A, Fault-Tolerant Computing, at Univ. of California, Santa Barbara. Instructors can use these slides freely in classroom teaching or for other educational purposes. Unauthorized uses, including distribution for profit, are strictly prohibited. Behrooz ParhamiEditionReleasedRev
3、isedRevisedRevisedRevisedFirstSep. 2006Oct. 2007Oct. 2009Oct. 2012Oct. 20122Part III Faults: Logical Deviations9 Fault TestingOct. 20123Part III Faults: Logical DeviationsThe good news is that the tests dont show any other problemsOct. 20124Part III Faults: Logical DeviationsOct. 20125Part III Fault
4、s: Logical Deviations9.1 Overview and Fault ModelsThe faulty stateand transitions into and out of itBurn-in testing Fault testingFault removalFault masking Error removalOct. 20126Part III Faults: Logical DeviationsA Taxonomy of Fault TestingFAULT TESTING(Engineering, Manufacturing, Maintenance)Corre
5、ctdesign?Correctimplementation?CorrectOperation?Oct. 20127Part III Faults: Logical DeviationsRequirements and Setup for TestingReference valueTest patternsourceCircuit under test(CUT)ComparatorPass/FailTestability requires controllability and observability (redundancy may reduce testability if we ar
6、e not careful; e.g., TMR)Reference value can come from a “gold” version or from a tableTest patterns may be randomly generated, come from a preset list, or be selected according to previous test outcomesTest results may be compressed into a “signature” before comparingTest application may be off-lin
7、e or on-line (concurrent)Easier to test if direct access to some inner points is possibleOct. 20128Part III Faults: Logical DeviationsImportance and Limitations of Testing“Trying to improve software quality by increasing the amount of testing is like trying to lose weight by weighing yourself more o
8、ften.”Steve C. McConnell“Program testing can be used to show the presence of bugs, but never to show their absence!” Edsger W. DijkstraTest coverage may be well below 100% (model inaccuracies and impossibility of dealing with all combinations of the modeled faults)Important to detect faults as early
9、 as possibleApproximate cost of catching a fault at various levels Component $1 Board $10 System $100 Field $1000Oct. 20129Part III Faults: Logical DeviationsFault Models at Different Abstraction LevelsFault model is an abstract specification of the types of deviations in logic values that one expec
10、ts in the circuit under testCan be specified at various levels: transistor, gate, function, systemTransistor-level faults Caused by defects, shorts/opens, electromigration, transients, . . . May lead to high current, incorrect output, intermediate voltage, . . . Modeled as stuck-on/off, bridging, de
11、lay, coupling, crosstalk faults Quickly become intractable because of the large model spaceFunction-level faults Selected in an ad hoc manner based on the function of a block (decoder, ALU, memory)System-level faults (malfunctions, in our terminology) Will discuss later in Part VOct. 201210Part III
12、Faults: Logical DeviationsGate- or Logic-Level Fault ModelsMost popular models (due to their accuracy and relative tractability)Line stuck faults Stuck-at-0 (s-a-0) Stuck-at-1 (s-a-1)Line open faults Often can be modeled as s-a-0 or s-a-1Delay faults (less tractable than the previous fault types) Si
13、gnals experience unusual delaysABCSKs-a-0Line bridging faults Unintended connection (wired OR/AND)Short(OR)OpenOther faults Coupling, crosstalkOct. 201211Part III Faults: Logical Deviations9.2 Path Sensitization and D-AlgorithmThe main idea behind test design: control the faulty point from inputs an
14、d propagate its behavior to some outputExample: s-a-0 faultTest must force the line to 1This method is formalized in the D-algorithmABCSKs-a-01/0111/001/0Two possible tests(A, B, C) = (0 1 1) or (1 0 1)Forward trace(sensitization)Backward traceD-calculus1/0 on the diagram above is represented as D0/
15、1 is represented as DEncounters difficulties with XOR gates (PODEM algorithm fixes this)Oct. 201212Part III Faults: Logical DeviationsSelection of a Minimal Test SetEach input pattern detects a subset of all possible faults of interest (according to our fault model)ABCSKE F G H J L N M P R Q Choosin
16、g a minimal test set is a covering problemABCPQs-a-0 s-a-1 s-a-0 s-a-1000- - x011x -x -101? ? ?111? ? ?Equivalent faults: e.g.,P s-a-0 L s-a-0 Q s-a-0Q s-a-1 R s-a-1 K s-a-1Oct. 201213Part III Faults: Logical DeviationsCapabilities and Complexity of D-AlgorithmReconvergent fan-out Consider the s inp
17、ut s-a-0PODEM solves the problem by setting y to 011D1 DD z s y xWorst-case complexity of D-algorithm is exponential in circuit size Must consider all path combinations XOR gates cause the behavior to approach the worst case Average case is much better; quadraticPODEM: Path-oriented decision making
18、Developed by Goel in 1981 Also exponential, but in the number of circuit inputs, not its sizeSimple path sensitization does not allow us to propagate the fault to the primary output zOct. 201214Part III Faults: Logical Deviations9.3 Boolean Difference MethodsK = f(A, B, C) = AB BC CAdK/dB= f(A, 0, C
19、) f(A, 1, C)= CA (A C) = A CABCSKE F G H J L N M P R Q s-a-0K = PC ABdK/dP= AB (C AB) = C(AB)Tests that detect P s-a-0 are solutions to the equation P dK/dP = 1(A B) C(AB) = 1 C = 1, A BTests that detect P s-a-1 are solutions to the equation P dK/dP = 1(A B) C(AB) = 1 C = 1, A = B = 0Oct. 201215Part
20、 III Faults: Logical Deviations9.4 The Complexity of Fault TestingThe satisfiability problem (SAT) Decision problem: Is a Boolean expression satisfiable? (i.e., can we assign values to the variables to make the result 1?)Theorem (Cook, 1971): SAT is NP-completeIn fact, even restricted versions of SA
21、T remain NP-completeTo prove the NP-completeness of test generation, we need to show that SAT (or some other NP-complete problem) can be converted to test generationAccording to the Boolean difference formulation, test generation can be converted to SAT (find the solutions to P dK/dP = 1)For a simpl
22、e alternate proof, see Fuji85, pp. 113-114Oct. 201216Part III Faults: Logical Deviations9.5 Testing of Units with MemoryThe presence of memory expands the number of required test casesTo test a sequential machine, we may need to apply different input sequences for each possible initial stateExponent
23、ially many possible input sequencesExponentially many possible machine statesOct. 201217Part III Faults: Logical DeviationsTesting of MemorySimple-minded approach: Write 000 . . . 00 and 111 . . . 11 into every memory word and read out to verify proper storage and retrievalProblems with the simple-m
24、inded approach: Does not test access/decoding mechanism How do you know the intended word was written into and read from? Many memory faults are pattern-sensitive, where cell operation is affected by the values stored in nearby cells Modern high-density memories experience dynamic faults that are ex
25、posed only for specific access sequencesMemory testing continues to be an active research areaBuilt-in self test is the only viable approach in the long termChallenge: Any run time testing consumes some memory bandwidthOct. 201218Part III Faults: Logical Deviations9.6 Off-Line vs. Concurrent Testing
26、This section will be forthcoming.Oct. 201219Part III Faults: Logical Deviations10 Fault MaskingOct. 201220Part III Faults: Logical DeviationsOct. 201221Part III Faults: Logical DeviationsOct. 201222Part III Faults: Logical Deviations10.1 Fault Avoidance vs. MaskingRepairDiscardAbortPreventRemoveExpo
27、seMaskAvoidTolerateFaultQuality AssuranceTestingDynamic RedundancyStatic Redundancy Full? Full? MonitorTest Yes Yes NoNoPerfectFixedRestoredUnaffectedInjuredScreenedFaulty-safeDegradedFaulty DetectMiss DetectMissC i r c u i t o r S y s t e m S t a t eReconfigureMaskConcealOct. 201223Part III Faults:
28、 Logical Deviations10.2 Interwoven Redundant Logic0 1 fault in b is critical b c a d f g eh z10000010 00 1 fault in c or d is not critical (it is masked)1 0 fault in a or h is not critical (it is masked)Even nonredundant circuits have some masking capabilityIs there a way to exploit the inherent mas
29、king capabilities of logic gates to achieve general fault masking?Oct. 201224Part III Faults: Logical DeviationsHow Interwoven Logic WorksLet x1, x2, x3, and x4 be 4 copies of the signal x1 0 b c a d f g eh z1 0a1a2b1b2a1a2b1b2a3a4b3b4a3a4b3b4e1e2e3e4f1f2f3f4e1e4f1f41 0 change is critical for AND, s
30、ubcritical for OR0 1 change is critical for OR, subcritical for ANDTo mask h critical faults: Number of gates multiplied by (h + 1)2 Gate inputs multiplied by h + 1For h = 1, the scheme is known as Quadded logicAlternating layers of ANDs and ORs can mask each others critical faults1111Oct. 201225Par
31、t III Faults: Logical DeviationsInterwoven Logic for NanoelectronicsHalf-adder implemented in quadded logicFrom: /iel5/54/32070/01492293.pdf IEEE D&TJuly-Aug. 2005pp. 328-339 b c a s b a c sOct. 201226Part III Faults: Logical DeviationsHighly Reliable Logic with “Crummy” RelaysMoore & Shannon, 1956a
32、: prob contact made | energized1 a: prob contact open | energizedc: prob contact made | not energized1 c: prob contact open | not energized“Make” contact(normally open)a c“Break” contact(normally closed)a a if a 0.62)prob connection made | not energized = 2c2 c4 (always 0.5, c 0 Rm 1/2V231 Voting un
33、itRRm1.00.01.00.50.00.5 TMR better Simplex betterRIFTMR/Simplex = (1 Rm)/(1 R) = 1/1 Rm(2Rm 1)R = 3Rm2 2Rm3 Rm? TMR:ltR00.51.0ln 20.0 TMR SimplexMTTF: TMR 5/(6l) Simplex 1/l156Oct. 201228Part III Faults: Logical DeviationsA TMR Application and Its Bit-Voting UnitSingle-event upset (SEU) = Soft error
34、Change of state caused by a high-energy particle strike SEU effect on DRAMs (from SANYO website)n+ diffusion layerpn junction fielddue to impactTMR flip-flop for SEU toleranceDQCDQCDQC01DataClockOutput0123MuxOct. 201229Part III Faults: Logical DeviationsExample: SEU Hardened Flip-FlopDGBABABASSSYYYA
35、NQBNQCNQYYYAAAABCYYYAAAABCABCABCABCABCABCABCYYYAFBBFBCFBA YFor list of flip-flop hardening methods and their comparison, see:/richcontent/fpga_content/pages/notes/seu_hardening.htm Oct. 201230Part III Faults: Logical DeviationsN-Modular Redundancy (NMR)Triple-modular redundancy (TMR) can be generali
36、zed to N unitsN-modular redundancy (NMR) uses N modules along with a voter, with N usually being oddExample: 5MROperates correctly as long as 3 of the 5 modules are healthyV231 Voting unit12345Voter complexity rises rapidly with increasing NEven values of N are also feasibleExample: 4MR, with 3-out-
37、of-4 votingVoter masks single faults; can be designed to detect double faultsOct. 201231Part III Faults: Logical Deviations10.4 Dynamic and Hybrid Redundancy1. Detect and replace Dynamic redundancy (cold/hot standby) Detection via - coding, watchdog timer, self-checking - duplication (pair-and-spare
38、s)2. Mask in place Static redundancy May revert to simplex instead of duplex Design challenges include - synchronization for voting - voting on imprecise results3. Mask, diagnose, and reconfigure Hybrid redundancy Fault masked at output, but diagnosed - e.g., via comparison with voter output Faulty
39、circuit is replaced by spare Becomes static upon spare exhaustionV231 Voting unitD21 Detector SpareVS2314Switch-voter SpareOct. 201232Part III Faults: Logical DeviationsComparing Replication SchemesAdvantagesDrawbacksLess powerCoverage factor (cold standby)Long lifeTolerance latency (just add spares
40、)Immediate maskingPower/area penalty High safetyVoting critical Immediate maskingPower/area penaltyLong life andSwitch-voting critical high safety V231 Voting unitD21 Detector SpareVS2314Switch-votingunit SpareOct. 201233Part III Faults: Logical DeviationsSwitch for Standby RedundancyStandby redunda
41、ncy requires an n-to-1 switch to select the output of the currently active moduleThe detectors use various info to deduce fault conditions- Error coding- Reasonableness checks- Watchdog timerD21 Detector SpareD21 SparesD3D n-to-1 switchOnce a fault has been detected, the switch reconfigures the syst
42、em by flagging the faulty unit and activating next spare in sequenceIf we use an n-to-2 switch and compare the two selected outputs, the configuration is known as “pair-and-spares”Oct. 201234Part III Faults: Logical DeviationsFault Detection in Standby RedundancyActivity monitoringD21 Detector Spare
43、Duplication and comparisonSelf-checking designOct. 201235Part III Faults: Logical DeviationsPreview of Self-Checking DesignCovered in Chapter 15FunctionunitStatusEncoded inputEncoded outputSelf-checking checkerFunctionunit 1Encoded inputSelf-checking checkerFunctionunit 2Encoded outputSelf-checking
44、checkerFunction unit designed such that internal faults manifest themselves as invalid outputsCan remove this checker if we do not expect both units to fail and Function unit 2 translates any noncodeword input into noncode outputOutput of multiple checkers may be combined in self-checking mannerOct.
45、 201236Part III Faults: Logical DeviationsSwitch for Hybrid RedundancyHybrid redundancy with n active and s spare modules requires an (n + s)-to-n switch to select the outputs of the active modulesSelf-purging redundancy is a variant of hybrid redundancy in which all modules are active at the outset
46、, but they are purged as they disagree with the majority outputVS2314Switch-voting SpareVoting unit in self-purging redundancy is a threshold voter that considers the inputs with weights of 1 (active) or 0 (purged)Switch built of iterative cells.Oct. 201237Part III Faults: Logical Deviations10.5 Tim
47、e RedundancyRetry upon a detected fault: particularly useful for transient faultsRecomputation not useful with permanent faultsCan make recomputation work by slightly changing the operands, but this is not always applicableCompute a (2b) instead of (2a) bCompute b + a or (a b) instead of a + bOct. 2
48、01238Part III Faults: Logical Deviations10.6 Variations and ComplicationsStatic redundancy makes fault testing more challengingFor static redundancy to be effective, we must ensure that initially all redundant components are fault-freeMode0123V231 Voting unitVoting unitControllable, but not observab
49、leOct. 201239Part III Faults: Logical DeviationsApplications of NMR and Hybrid RedundancyNASAs Space Shuttle (retired in 2012): Used 5-way redundancy in hardware Originally, 3 operational units + 2 spares (one warm, one cold) More recently, 4 operational + 1 spareAlso, uses 2 independently developed
50、 software systems (Design diversity)Japanese Shinkansen “Bullet” TrainTriple-duplex system (6-fold redundancy)Oct. 201240Part III Faults: Logical Deviations11 Design for TestabilityOct. 201241Part III Faults: Logical DeviationsSomeone in this house flunked his earth science test because someone else
51、 in this house told him that love makes the world go around!Oct. 201242Part III Faults: Logical DeviationsOct. 201243Part III Faults: Logical Deviations11.1 The Importance of TestabilityA small circuit with a limited number of inputs and outputs can be tested with a reasonable amount of effort and t
52、imeA complex unit, such as a microprocessor, cannot be tested solely based on its input/output behaviorHence, the need for provisions in the design to facilitate testingOct. 201244Part III Faults: Logical Deviations11.2 Testability ModelingTo allow detection of a fault in point A of a logic circuit,
53、 we need to: Be able to control that point from the primary inputs Be able to observe that point from the primary outputsThus, good testability requires good controllability and good observability for every node in the circuitCircuit under test (CUT)AOct. 201245Part III Faults: Logical DeviationsQua
54、ntifying ControllabilityN(0) = 7N(1) = 1CTF = 0.25N(0) = 1N(1) = 7CTF = 0.25Coutput = (Si Cinput i / k) CTF1.00.30.5 0.15A line with very low controllability is a good test point candidate 0Control pointControllability C of a line has a value between 0 and 1 Derive C values by proceeding from inputs
55、 (C = 1) to outputsk-input, 1-output componentsf-way fan-outCC / (1 + log2 f) for each of ffan-out linesControllability transfer factorCTF = 1 N(0) N(1)N(0) + N(1)N(0): # input patterns leading to 1 outputN(1): # input patterns leading to 0 outputOct. 201246Part III Faults: Logical DeviationsQuantif
56、ying Observability5 0.6A line with very low observability is a good test point candidate 1Observation pointk-input, 1-output componentsN(sp) = 1N(ip) = 3OTF = 0.25Oinput i = Ooutput OTFf-way fan-out1 Pj(1 Oj)Oj for line jObservability O of a line has a value between 0 and 1 Derive O value
57、s by proceeding from outputs (O = 1) to inputsObservability transfer factorOTF =N(sp)N(sp) + N(ip)N(sp): # ways of sensitizing a path to outputN(ip): # ways of inhibiting a path to outputN(sp) = 1N(ip) = 3OTF = 0.25Oct. 201247Part III Faults: Logical DeviationsQuantifying TestabilityTestability = Co
58、ntrollability Observability 1.00.30.5 0.15Controllabilities5 0.6Observabilities0.150.0450.075 0.09TestabilitiesOverall testability of a circuit = Average of line testabilities Oct. 201248Part III Faults: Logical Deviations11.3 Testpoint InsertionIncrease controllability and observability
59、via the insertion of degating mechanisms and control pointsDesign for dual-mode operation Normal mode Test modeDegateControl/ObserveABMuxesPartitioneddesignABNormal modeABTest modefor AOct. 201249Part III Faults: Logical Deviations11.4 Sequential Scan TechniquesIncrease controllability and observabi
60、lity via provision of mechanisms to set and observe internal flip-flopsScan design Shift desired states into FF Shift out FF states to observeCombinational logic.FFFF.Mode controlCombinational logic.FFFF.Partial scan design:Mitigates the excessive overhead of a full scan designOct. 201250Part III Fa
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025农村个人林地抵押借款合同
- 江苏省海安八校2025年初三第一次模拟(期末)物理试题含解析
- 民办安徽旅游职业学院《传统养生学》2023-2024学年第二学期期末试卷
- 湖北理工学院《电视专题与专栏》2023-2024学年第二学期期末试卷
- 九江职业大学《传媒专业英语》2023-2024学年第一学期期末试卷
- 武汉民政职业学院《林木生物信息学》2023-2024学年第二学期期末试卷
- 三门峡职业技术学院《影视音乐》2023-2024学年第一学期期末试卷
- 2025年标准租赁合同模板2
- 南京邮电大学《植景设计原理》2023-2024学年第二学期期末试卷
- 2025婚纱摄影工作室加盟合同范本
- TAVI(经皮导管主动脉瓣植入术)术后护理
- 6.3.1 平面向量基本定理 课件(共15张PPT)
- 建筑消防设施巡查记录
- 混凝土护栏检查记录表
- 厨房隔油池清理记录
- 常见生物相容性实验汇总
- DBJ04∕T 258-2016 建筑地基基础勘察设计规范
- 综合探究三 探寻丝绸之路(课堂运用)
- 职业危害防治实施管理台账
- 社会团体民办非清算审计报告模板
- 建筑工程质量检测收费项目及标准表67262
评论
0/150
提交评论