容错计算英文版课件:5 Defect Avoidance_第1页
容错计算英文版课件:5 Defect Avoidance_第2页
容错计算英文版课件:5 Defect Avoidance_第3页
容错计算英文版课件:5 Defect Avoidance_第4页
容错计算英文版课件:5 Defect Avoidance_第5页
已阅读5页,还剩67页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Oct. 20121Part II Defects: Physical ImperfectionsAbout This PresentationThis presentation is intended to support the use of the textbook Dependable Computing: A Multilevel Approach (traditional print or on-line open publication, TBD). It is updated regularly by the author as part of his teaching of

2、the graduate course ECE 257A, Fault-Tolerant Computing, at Univ. of California, Santa Barbara. Instructors can use these slides freely in classroom teaching or for other educational purposes. Unauthorized uses, including distribution for profit, are strictly prohibited. Behrooz ParhamiEditionRelease

3、dRevisedRevisedRevisedRevisedFirstSep. 2006Oct. 2007Oct. 2009Oct. 2012Oct. 20122Part II Defects: Physical Imperfections5 Defect AvoidanceOct. 20123Part II Defects: Physical ImperfectionsOct. 20124Part II Defects: Physical ImperfectionsOct. 20125Part II Defects: Physical Imperfections5.1 Types and Ca

4、uses of DefectsResistive open due to unfilled via R. Madge et al., IEEE D&T, 2003Particle embedded between layersOct. 20126Part II Defects: Physical ImperfectionsProcess and Operational VariationsEven if there isnt a complete short or open, resistance and capacitance variations can lead to troubleCh

5、ip temperature mapOct. 20127Part II Defects: Physical ImperfectionsAnalogy: Ideal vs. Real Clock SignalsReal clock signal is quite differentIdeal clock signal has sharp edges and an exact constant periodOct. 20128Part II Defects: Physical ImperfectionsDisk Memory DefectsThe tiniest particle or scrat

6、ch can wipe out many thousands of bitsOct. 20129Part II Defects: Physical Imperfections5.2 Yield and Its Associated CostsOct. 201210Part II Defects: Physical ImperfectionsThe dramatic decrease in yield with larger dies Effect of Die Size on YieldDie yield =def (Number of good dies) / (Total number o

7、f dies) Die yield = Wafer yield 1 + (Defect density Die area) / aa Die cost = (Cost of wafer) / (Total number of dies Die yield) = (Cost of wafer) (Die area / Wafer area) / (Die yield)The parameter a ranges from 3 to 4 for modern CMOS processesShown are some random defects; there are also bulk or cl

8、ustered defects that affect a large regionOct. 201211Part II Defects: Physical ImperfectionsEffects of Yield on Testing and Part ReliabilityAssume a die yield of 50% Out of 2,000,000 dies manufactured, 1,000,000 are defective To achieve the goal of 100 defects per million (DPM) in parts shipped, we

9、must catch 999,900 of the 1,000,000 defective partsTherefore, we need a test coverage of 99.99%Testing is imperfect: missed defects/faults (coverage), false positivesGoing from a coverage of 99.9% to 99.99% involves a significant investment in test development and application timesFalse positives ar

10、e not a source of difficulty in this contextDiscarding another 1-2% due to false positives in testing does not change the scale of the lossOct. 201212Part II Defects: Physical ImperfectionsOct. 2009Part II Defects: Physical ImperfectionsSlide 135.3 Defect ModelingDefect are of two main types:Not eve

11、ry spot defect leads to structural or parametric damageActual damage depends on location and size (relative to feature size)Global or gross-area defects are due to: Scratches (e.g., from wafer mishandling)Mask misalignmentover- and under-etchingLocal or spot defects are due to: Imperfect process (e.

12、g., extra or missing material)Effects of airborne particlesCan be eliminatedor minimizedHarderto deal withOct. 201213Part II Defects: Physical ImperfectionsExcess-Material and Pinhole DefectsExtra-material defects are modeled as circular areasPinhole defects are tiny breaches in the dielectric betwe

13、en conducting layersFrom: http:/www.see.ed.ac.uk/research/IMNS/papers/IEE_SMT95_Yield/IEEAbstract.html Oct. 201214Part II Defects: Physical ImperfectionsDefect Size DistributionSample random defect size distribution, assuming 0.3 defects per cm2From: /articles/10164/model-based-approach-allows-desig

14、n-for-yield.htmlDefect size (nm)f(x) =kxp for xmin x xmax 0otherwisex = Defect diameterf(x) = Defect densityk = Normalizing constantp is typically in 2.0, 3.5Oct. 201215Part II Defects: Physical Imperfections5.4 The Bathtub CurveMany components fail early on because of residual or latent defectsComp

15、onents may also wear out due to aging (less so for electronics)In between the two high-mortality regions lies the useful life periodTimeFailure rateInfant mortalityEnd-of-life wearoutUseful life (low, constant failure rate)MechanicalElectronicPrimarily due to latent defectslOct. 201216Part II Defect

16、s: Physical ImperfectionsSurvival Probability of Electronic ComponentsFrom: /hotwire/issue21/hottopics21.htm Infant mortalityTime in yearsPercent of parts still workingNo significantwear-outBathtub curveOct. 201217Part II Defects: Physical Imperfections5.5 Burn-in and Stress TestingFrom: /hotwire/is

17、sue21/hottopics21.htm Time in yearsPercent of parts still workingBurn-in and stress tests are done in accelerated formDifficult to perform on complex and delicate ICs without damaging good partsExpensive “ovens” are requiredOct. 201218Part II Defects: Physical ImperfectionsBurn-in Oven ExampleFrom:

18、http:/environmental_options.html Oct. 201219Part II Defects: Physical Imperfections5.6 Active Defect PreventionOther than initial or manufacturing imperfections, defects can develop over the course of a devices lifetimeRadiation-induced defects Defects due to shock and vibrationDefects due to mishan

19、dling (e.g., scratch or smudge on disk) . . . discussed in Chap. 7 dealing with shielding and hardeningDefects induced by harsh operating environmentTemperature controlLoad redistributionClock scalingOct. 201220Part II Defects: Physical Imperfections6 Defect CircumventionOct. 201221Part II Defects:

20、Physical Imperfections“This just in: the inhabitants of planet Earth are being recalled for the correction of a major defect.”Oct. 201222Part II Defects: Physical ImperfectionsOct. 201223Part II Defects: Physical ImperfectionsDefect Avoidance vs. CircumventionDefect AvoidanceDefect awareness in desi

21、gn, particularly floorplanning and routingExtensive quality control during the manufacturing processComprehensive screening, including burn-in and stress testsDefect Circumvention (Removal)Built-in dynamic redundancy on the die or waferIdentification of defective parts (visual inspection, testing, a

22、ssociation)Bypassing or reconfiguration via embedded switchesDefect Circumvention (Masking)Built-in static redundancy on the die or waferIdentification of defective parts (external test or self-test)Adjustment or tuning of redundant structuresOct. 201224Part II Defects: Physical Imperfections6.1 Det

23、ection of DefectsVisual or optical inspection:Focus on more problematic areas, such as edge of waferPhoto from: /article/327100-Defect_Detection_Drives_to_Greater_Depths.phpOct. 201225Part II Defects: Physical Imperfections6.2 Redundancy and ReconfigurationWorks best when the system on die has regul

24、ar, repetitive structure: Memory FPGA Multicore chip CMP (chip multiprocessor)Irregular (random) logic implies greater redundancy due to replication: Replicated structures must not be close to each other They should not be very far either (wiring/switching overhead) Oct. 201226Part II Defects: Physi

25、cal ImperfectionsAvoiding Bad Sectors on a DiskImage source: /img4A.jpgP-List: Permanent or primary defect tableG-List: Growth or post-use defect tableDoes not affect drive speedAffects drive performanceOct. 201227Part II Defects: Physical ImperfectionsPeripheral reconfiguration elements6.3 Defectiv

26、e Memory ArraysDefect circumvention (removal)Provide several extra (spare) rows and/or columnsRoute external connections to defect-free rows and columnsSpare rowsSpare rowsMemoryarrayMemoryarrayDefective rowDefectivecolumnDefect circumvention (masking)Error-correcting codeWith m rows and s spares, c

27、an model as m-out-of-(m + s) Somewhat more complex with both spare rows and columns(still combinational, though)Modeling with coded scheme to be discussed at the info levelMethods in use since the 1970s;e.g., IBMs defect-tolerant chipSpare columnsSpare columnsOct. 201228Part II Defects: Physical Imp

28、erfections6.4 Defects in Logic and FPGAsMoore and Shannons pioneering work:Building arbitrarily reliable relay circuits out of “crummy” relaysProb. that a relay device closes when it is supposed to be open = pProb. that a relay circuit closes when it is supposed to be open = h(p)If we can achieve h(

29、p) p, then repeated application of the composition scheme will lead to arbitrarily small h(h(h( . . . h(p)ph(p)h(p) p for p p for p 0.382h(p) = 4p2 4p3 + p4xxxxOct. 201229Part II Defects: Physical ImperfectionsDefect Circumvention in FPGAsDefect circumvention (removal)Provide several extra (spare) C

30、LBs, I/O blocks, and connectionsRoute external connections to available blocksDefect circumvention (masking)Not applicableOct. 201230Part II Defects: Physical ImperfectionsRouting Resources in FPGAsSimple 3 3 switch boxLimited configurabilityMore elaborate switch boxesHighly flexible connectionsLB o

31、r clusterVertical wiring channelsLB or clusterLB or clusterLB or clusterLB or clusterLB or clusterLB or clusterLB or clusterLB or clusterSwitch boxSwitch boxSwitch boxHorizontal wiring channelsSwitch boxProgrammableswitchWireDefect circumvention is quite natural because it relies on the same mechani

32、sms that are used for layout constraints (e.g., use only blocks in the upper left quadrant) or for blocks and interconnects that are no longer available due to prior assignmentOct. 201231Part II Defects: Physical ImperfectionsDefects in Multicore Chips or CMPsDefect circumvention (removal)Similar to

33、 FPGAs, except that processors are the replacement entitiesInterprocessor interconnection network is the main challengeWill discuss the switching and reconfiguration aspects in more detail when we get to the malfunction level in our multilevel modelOct. 201232Part II Defects: Physical Imperfections6

34、.5 Defective 1D and 2D ArraysMultiple resources on a chip not a challenge if they are independent in logic and I/O connectionsExample: To build an MPP out of 64-processor chips, one might place 72 processors on each chip to allow for up to 8 defective processorsGiven the probability of a processor (

35、including its external connections) being defective, the chip yield can be modeled as a 64-out-of-72 systemIn practice, we interconnect such processors on the chip to allow higher-bandwidth interprocessor communication and I/OOct. 201233Part II Defects: Physical ImperfectionsDefect Circumvention in

36、Regular ArraysExtensive research done on how to salvage a working array from one that has been damaged by defectsProposed methods differ in Types and placement of switches (e.g., 4-port, single/double-track) Types and placement of spares Algorithms for determining working configurations Ways of effe

37、cting reconfiguration Methods of assessing resilienceThe next few slides show some methods based on 4-port, 2-state switchesOct. 201234Part II Defects: Physical ImperfectionsDefect Circumvention in Linear ArraysA linear array with a spare processor and embedded switchingA linear array with a spare p

38、rocessor and reconfiguration switchesOct. 201235Part II Defects: Physical ImperfectionsDefect Circumvention in 2D ArraysTwo types of reconfiguration switching for 2D arraysAssumption: A defective unit can be bypassed in its row/column by means of a separate switching mechanism (not shown)Oct. 201236

39、Part II Defects: Physical ImperfectionsA Reconfiguration Scheme for 2D ArraysA 5 5 working array salvaged from a 6 6 redundant mesh through reconfiguration switchingSeven defective processors in a 5 5 array and their associated compensation pathsOct. 201237Part II Defects: Physical Imperfections6.6

40、Other Circumvention MethodsNanoelectronics with “crummy” components:Hybrid-technology FPGA, with CMOS logic elements and crossbar nanoswitches that are very compact, but highly unreliableAllows 8-fold increase in density, while providing reliable operation via defect circumventionImage source: W. Ro

41、binett et al., Communications of the ACM, Sep. 2007Oct. 201238Part II Defects: Physical ImperfectionsHighly Redundant Nanoelectronic MemoriesMemory with block-level redundancy:Based on hybrid semiconductor/nanodeviceimplementationError-correcting code applied for defect tolerance, as opposed to oper

42、ational or “soft” errorsImage source: Strukov/Likharev, Nanotechnology, Jan. 2005Oct. 201239Part II Defects: Physical Imperfections7 Shielding and HardeningOct. 201240Part II Defects: Physical ImperfectionsOct. 201241Part II Defects: Physical ImperfectionsOct. 201242Part II Defects: Physical Imperfe

43、ctions7.1 Interference and Cross-TalkSource: WikipediaElectromagnetic or radio-frequency interference (EMI, RFI) is a disturbance that affects an electrical circuit due to either electromagnetic conduction or electromagnetic radiation emitted from an external source. The disturbance may interrupt, o

44、bstruct, or otherwise degrade or limit the effective performance of the circuit.Crosstalk (XT) refers to any phenomenon by which a signal transmitted on one circuit or channel of a transmission system creates an undesired effect in another circuit or channel. Crosstalk is usually caused by undesired

45、 capacitive, inductive, or conductive coupling from one circuit, part of a circuit, or channel, to another.Interference can occur through the airor via shared power supplyOct. 201243Part II Defects: Physical ImperfectionsOn-Chip Cross-TalkShrinking feature sizes have made on-chip crosstalk a major p

46、roblemFrom: Duan09The interwire capacitance CI can easily exceed the load + parasitic capacitance CL for long buses, affecting power dissipation, speed, and signal integrityDenser layoutWires with taller cross sections (required for speed with scaling) make crosstalk problems worseOct. 201244Part II

47、 Defects: Physical ImperfectionsAggressorVictimCross-Talk Mitigation MethodsSpacing and staggering of wires that tend to produce heavier cross-talkFrom: Duan09Bus encoding: Details to be suppliedFor a discussion of crosstalk noise modeling and reduction, see:/dpan/2009Fall_EE382V/notes/lecture10_cro

48、sstalk.ppt/On-chip twisted pair Yu09Oct. 201245Part II Defects: Physical Imperfections7.2 Shielding via EnclosuresMaterials and techniques exist for shielding hardware from a variety of external influencesStatic-shield packageShielded cableRF-shielded packagingNASAs EAFTC computersOct. 201246Part II

49、 Defects: Physical Imperfections7.3 The Radiation ProblemElectromagnetic radiation: Ultraviolet (UV) radiation is nonpenetrating and thus easily stoppedX-ray and gamma radiations can be absorbed by atoms with heavy nuclei, such as leadNuclear reactors use a thick layer of suitably reinforced concret

50、eFrom: WikipediaParticle radiation: Alpha particles (helium nuclei), least penetrating, paper stops themBeta particles (electrons), more penetrating, stopped by aluminum sheetNeutron radiation, difficult to stop, requires bulky shieldingCosmic radiation, not a problem on earth, important for space e

51、lectronicsSecondary radiation: Interaction of primary radiation and shield materialOct. 201247Part II Defects: Physical ImperfectionsRadiation Effect on CMOS ICsOne-way mission to Mars: Exposes the electronics to about 1000 kilorad of radiation, which is near the limit of what is now tolerable by ad

52、vanced space electronicsImpact by high-energy particles, such as protons or heavy ionsFrom: http:/RHBD_primer.htmlRadiation ionizes the oxide, creating electrons and holes; the electrons then flow out, creating a positive charge which leads to current leak across the channelIt also decreases the thr

53、eshold voltage, which affects timing and other operational parametersOct. 201248Part II Defects: Physical ImperfectionsHeavy-Ion and Proton RadiationsFrom: http:/docs/Radcrs_Final.pdfOct. 201249Part II Defects: Physical ImperfectionsMore Details Regarding Radiation EffectsSource: “Single Event Upset

54、: An Embedded Tutorial,” by Wang and AgrawalOct. 201250Part II Defects: Physical ImperfectionsNegative Impacts of RadiationSingle-event latchup (SEL) or snapback: A heavy ion or a high-energy particle shorting the power source to substrate (high currents may result)Single-event upset (SEU): A single

55、 ion changing the state of a memory or register bit; multiple bits being affected is possible, but rareSingle-event transient (SET): The discharge of collected charge from an ionization event creating a spurious signalSingle-event induced burnout (SEB): A drain-source voltage exceeding the breakdown

56、 threshold of the parasitic structuresSingle-event gate rupture (SEGR): A heavy ion hitting the gate region, combined with applied high voltage, as in EEPROMs, creates breakdownOct. 201251Part II Defects: Physical Imperfections7.4 Radiation HardeningShielding the package or the chip itself: Radioact

57、ive-resistant packaging or use of more resilient material in the chips compositionUse of insulating or wide-band-gap substrate: Instead of common, and fairly inexpensive, semiconductor substrateReplace DRAM with the more rugged SRAM: Capacitor-based DRAM is particularly susceptible to upset eventsFa

58、ult- and error-level methods: Circuit duplication/triplication with comparison/voting, or coding, lead to area and power penaltiesSystem and application-level methods: On-line or periodic testing, liveness checks, frequent resets Oct. 201252Part II Defects: Physical ImperfectionsPackaging Solutions

59、to the Radiation ProblemShielding much less effective against proton radiationPackaging can be a partial solution to slow down the particlesFrom: http:/docs/Radcrs_Final.pdfOct. 201253Part II Defects: Physical Imperfections7.5 Vibrations, Shocks, and SpillsHundreds of patents on the topic, but very

60、little published materialPanasonic Toughbook(MIL-STD-810G)Shock-resistant or ruggedized computers are useful for military personnel, law enforcement, emergency response teams, and childrenRuggedized can mean:Shock- or drop-resistantHeat-resistantWater-resistant (e.g., for water rescue)Most common ac

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论