1、1n safety engineering is an applied science strongly related to systems engineering. safety engineering assures that a life-critical system behaves as needed even when pieces(部件部件) fail. n 安全工程是一门应用科学,与系统工程关系密切。安全工程确保生安全工程是一门应用科学,与系统工程关系密切。安全工程确保生命关键系统即使部件发生故障也能按需要运行。命关键系统即使部件发生故障也能按需要运行。 2n a life-

2、critical system or safety-critical system is a system whose failure or malfunction may result in death or serious injury. an example of a life-critical system:the picture below illustrates what can happen when a life-critical system fails. this particular crash took place at the paris air show a few

3、 years ago. our understanding is that a mechanical failure caused the crash. the pilot sustained only minor injuries. n生命关键系统或安全关键系统指的是当该系统发生故障时,可能导致死生命关键系统或安全关键系统指的是当该系统发生故障时,可能导致死亡或者严重的伤害。一个生命关键系统的例子:下图显示了一个生命关亡或者严重的伤害。一个生命关键系统的例子:下图显示了一个生命关键系统发生故障时能够导致什么事情发生。这次特殊的坠落发生在几年键系统发生故障时能够导致什么事情发生。这次特殊的坠落

4、发生在几年前的巴黎航空表演上。据我们理解是机械故障导致这次坠落。飞行员伤前的巴黎航空表演上。据我们理解是机械故障导致这次坠落。飞行员伤害较小。害较小。 3n safety engineers distinguish different extents of defective(有缺陷的有缺陷的) operation: a fault is said to occur when some piece of equipment does not operate as designed. a failure only occurs if a human being (other than a rep

5、air person) has to cope with the situation. a critical failure endangers one or a few people. a catastrophic“(灾灾难的难的) failure endangers, harms or kills a significant number of people.n 安全工程师能够区分不同程度的不正常运行状态:故障发生在设备某零安全工程师能够区分不同程度的不正常运行状态:故障发生在设备某零件不按设计运转的时候;事故仅发生在工作人员(不是修理工)不得不件不按设计运转的时候;事故仅发生在工作人员(

6、不是修理工)不得不应付该情况的时候;严重事故能够危及一个或几个人;灾难性的事故危应付该情况的时候;严重事故能够危及一个或几个人;灾难性的事故危及、伤害或杀死大量的人。及、伤害或杀死大量的人。4n safety engineers also identify different modes of safe operation: a “probabilistically(概率的概率的) safe” system has no single point of failure, and enough redundant(多余的多余的) sensors(传感器传感器), computers and ef

7、fectors so that it is very unlikely to cause harm (usually very unlikely means less than one human life lost in a billion hours of operation). n 安全工程师也能够鉴别不同方式的安全运行状态:安全工程师也能够鉴别不同方式的安全运行状态:“概率安全概率安全”系统没系统没有个别失灵的部件,并且有足够多的传感器,计算器和效应器以至于完有个别失灵的部件,并且有足够多的传感器,计算器和效应器以至于完全不可能引起伤害(通常全不可能引起伤害(通常“完全不可能完全不可能

8、”指小于十亿小时的运行导致一指小于十亿小时的运行导致一人丧命的概率)。人丧命的概率)。5n an “inherently(本本固有的固有的) safe” system is a clever mechanical arrangement that cannot be made to cause harm- obviously the best arrangement, but this is not always possible. for example, “inherently safe” airplanes are not possible. a “fail-safe” system i

9、s one that cannot cause harm when it fails. a “fault-tolerant(容忍容忍) system can continue to operate with faults, though its operation may be degraded in some fashion.n “本质安全本质安全”系统是一种很巧的机械布置,不会产生伤害(很明显是一种系统是一种很巧的机械布置,不会产生伤害(很明显是一种最佳的布置形式,但通常不可能)。例如最佳的布置形式,但通常不可能)。例如“本质安全本质安全”飞机是不可能的。飞机是不可能的。“故障安全故障安全

10、”系统指的是当发生故障时不会产生伤害。系统指的是当发生故障时不会产生伤害。“故障容许故障容许”系系统发生故障时还能继续工作,尽管其运行能力从某种形式上下降了。统发生故障时还能继续工作,尽管其运行能力从某种形式上下降了。6n these terms combine to describe the safety needed by systems: for example, most biomedical(生物医学的生物医学的) equipment is only “critical,” and often another identical(同样的同样的) piece of equipment

11、is nearby, so it can be merely “probabilistically(概率概率) fail-safe”. train signals can cause “catastrophic(灾难的灾难的)” accidents (imagine chemical releases from tank-cars(油罐车油罐车,洒水车洒水车) and are usually inherently safe. n 这些术语可以相互联合来描述系这些术语可以相互联合来描述系统所需要的安全。例如,大多数统所需要的安全。例如,大多数生物医学设备称为生物医学设备称为“关键的关键的(重要重

12、要的的)”,而通常另外的同样的设备则与,而通常另外的同样的设备则与此相似,因此通常被称为此相似,因此通常被称为“概率故概率故障安全障安全”系统。火车信号问题能够系统。火车信号问题能够引发灾难性的事故引发灾难性的事故(可以想象化学物可以想象化学物质从油罐车里泄漏出来的情景质从油罐车里泄漏出来的情景),这,这(火车信号火车信号)通常是通常是“本质安全型本质安全型”的。的。7n aircraft “failures” are “catastrophic” (at least for their passengers and crew(工作人员工作人员),) so aircraft are usual

13、ly “probabilistically fault-tolerant”. without any safety features, nuclear reactors might have “catastrophic failures”, so real nuclear reactors are required to be at least “probabilistically fail-safe”, and some pebble (小圆石小圆石) bed reactorsare “inherently fault-tolerant.n 空难是灾难性的空难是灾难性的(至少对乘客和机组人员

14、是这样至少对乘客和机组人员是这样),因此飞机通常是,因此飞机通常是“概率概率故障容许故障容许”系统。在没有任何安全特征的情况下,核反应堆也许就有了系统。在没有任何安全特征的情况下,核反应堆也许就有了“灾难性的事故灾难性的事故”,因此,现实的核反应堆要求至少是概率故障安全系统,因此,现实的核反应堆要求至少是概率故障安全系统,并且圆石反应床是本质故障容许系统。并且圆石反应床是本质故障容许系统。8n ideally, safety-engineers take an early design of a system, analyze it to find what faults can occur,

15、 and then propose(提出提出) changes to make the system more safe. in an early design stage, often a fail-safe system can be made acceptably safe with a few sensors and some software to read them. probabilitically fault-tolerant systems can often be made by using more, but smaller and less-expensive piec

16、es of equipment.n 从理想的角度说,安全工程师参与到早期的系统设计中,分析该系统以发从理想的角度说,安全工程师参与到早期的系统设计中,分析该系统以发现哪些故障可能发生,然后提出修改方案使系统更安全。在早期设计阶段,现哪些故障可能发生,然后提出修改方案使系统更安全。在早期设计阶段,通常故障安全系统借助一些传感器及其数据显示软件可以成为可接受的安通常故障安全系统借助一些传感器及其数据显示软件可以成为可接受的安全系统。概率故障容许系统通常由更多,更小和更便宜的设备元件组成。全系统。概率故障容许系统通常由更多,更小和更便宜的设备元件组成。9n historically, many or

17、ganizations viewed safety engineering as a process to produce documentation to gain regulatory approval, rather than a real asset to the engineering process. these same organizations have often made their views into a self-fulfilling prophecy(自我实现的预言自我实现的预言) by assigning less-able personnel to safet

18、y engineering. n (a self-fulfilling prophecy is a prediction that, in being made, actually causes itself to become true.) n 从历史上来看,许多组织把从历史上来看,许多组织把“安全工程安全工程”看成提出文件以得到规章制度批看成提出文件以得到规章制度批准的过程,而不是工程过程的真正的资源。这些相同组织通常指派少些称准的过程,而不是工程过程的真正的资源。这些相同组织通常指派少些称职的人员参与到安全工程中,从而使他们的观点成为自我实现的预言。职的人员参与到安全工程中,从而使他们的

19、观点成为自我实现的预言。n 自我实现的预言是指预言一旦制定就会自我促进成为现实。自我实现的预言是指预言一旦制定就会自我促进成为现实。10n far too often, rather than actually helping with the design, safety engineers are assigned to prove that an existing, completed design is safe. if a competent safety engineer then discovers significant safety problems late in the

20、design process, correcting them can be very expensive. this project management error has wasted large sums of money in the development of commercial nuclear reactors.n 通常,与其说指派安全工程师来实际地帮助设计,还不如说让他们来证明通常,与其说指派安全工程师来实际地帮助设计,还不如说让他们来证明存在的、已完成的设计是安全的。如果称职的安全工程师在设计后期发现存在的、已完成的设计是安全的。如果称职的安全工程师在设计后期发现重大安全

21、问题,那么要纠正这些问题的代价将是巨大的。在发展商用的核重大安全问题,那么要纠正这些问题的代价将是巨大的。在发展商用的核反应堆时,项目管理失误已浪费大量的金钱。反应堆时,项目管理失误已浪费大量的金钱。 11n additionally, failure mitigation(减少减少) can go beyond design recommendations, particularly in the area of maintenance. there is an entire realm(领域领域) of safety and reliability engineering(可靠性工程可靠性

22、工程) known as “reliability centered maintenance” (rcm), which is a discipline(纪律纪律) that is a direct result of analyzing potential failures within a system, and determining maintenance actions that can mitigate the risk of failure. n 另外,安全工程师除了提出设计建议外,还帮助减少故障,尤其在维护领另外,安全工程师除了提出设计建议外,还帮助减少故障,尤其在维护领域。安

23、全和可靠性工程的整个领域被称为域。安全和可靠性工程的整个领域被称为“以维护为中心的可靠性以维护为中心的可靠性”,该,该学科是分析系统潜在故障,并决定降低故障风险维护措施的直接结果。学科是分析系统潜在故障,并决定降低故障风险维护措施的直接结果。 12n this methodology is used extensively on aircraft, and involves understanding the failure modes of the serviceable(维修维修) replaceable(更换更换) assemblies(装配装配,组件组件), in addition t

24、o the means to detect or predict an impend-ing(即将发生即将发生) failure. every automobile owner is familiar with this concept when they take in their car to have the oil changed or brakes checked. even filling up ones car with gas is a simple example of a failure mode (failure due to fuel starvation), a me

25、ans of detection (gas gauge(仪表仪表), and a maintenance action.n 该方法论广泛应用在航行器中,除了包括探测或预测迫近故障的措施外,该方法论广泛应用在航行器中,除了包括探测或预测迫近故障的措施外,还涉及到清楚牢固的及耐用组件的故障模式。所有车主当给车加油或者检还涉及到清楚牢固的及耐用组件的故障模式。所有车主当给车加油或者检查车闸时都熟悉这种观念。即使为自己的汽车充满汽油也是故障模式的一查车闸时都熟悉这种观念。即使为自己的汽车充满汽油也是故障模式的一个简单例子(由于燃料燃尽而产生故障),监测方法为燃油仪表,维修措个简单例子(由于燃料燃尽而产

26、生故障),监测方法为燃油仪表,维修措施为加满。施为加满。13n for large scale complex systems, hundreds if not thousands of maintenance actions(维修措施维修措施) can result from the failure analysis. these maintenance actions are based on conditions (eg, gauge(仪仪表表) reading or leaky valve(阀阀), hard conditions (eg, a component is known t

27、o fail after 100 hours of operation with 95% certainty), or require inspection(检查检查) to determine the maintenance action (eg, metal fatigue(疲劳疲劳). the reliability centered maintenance concept then analyzes each individual maintenance item for its risk contribution to safey, mission, operational read

28、iness, or cost to repair if a failure does occur. n 对于大规模复杂系统,通过故障分析即使产生不了数千个维修措施也能产对于大规模复杂系统,通过故障分析即使产生不了数千个维修措施也能产生数百个。这些维修措施建立在状态(例如,仪表读数或者密封不严的阀)生数百个。这些维修措施建立在状态(例如,仪表读数或者密封不严的阀)和确定条件(例如知道一种部件工作和确定条件(例如知道一种部件工作100小时后,其发生故障的可能性为小时后,其发生故障的可能性为95%)基础上,或者需要通过检查来确定维修措施(例如金属疲劳)。)基础上,或者需要通过检查来确定维修措施(例如

29、金属疲劳)。“以维护为中心的可靠性以维护为中心的可靠性”概念然后分析每一单个维修元件的危险作用、概念然后分析每一单个维修元件的危险作用、任务、运行灵敏性、发生故障时维修的费用。任务、运行灵敏性、发生故障时维修的费用。14n then the sum total of all the maintenance actions are bundled(捆扎捆扎) into maintenance intervals(间隔间隔) so that maintenance is not occurring around the clock, but rather, at regular intervals

30、. this bundling process introduces further complexity, as it might stretch(伸长伸长) some maintenance cycles, thereby increasing risk, but reduce others, thereby potentially reducing risk, with the end result being a comprehensive maintenance schedule, purpose built to reduce operational risk and ensure

31、 acceptable levels of operational readiness and availability.n 然后,所有维修措施的总和按维修的时间间隔捆扎在一起,使维修不是发然后,所有维修措施的总和按维修的时间间隔捆扎在一起,使维修不是发生在整日整夜,而是间隔一定时间。这种打包方法引入了更深一步的复杂生在整日整夜,而是间隔一定时间。这种打包方法引入了更深一步的复杂性,因为它除了可能缩短维修周期,从而降低潜在风险外,还可能延长维性,因为它除了可能缩短维修周期,从而降低潜在风险外,还可能延长维修周期,因而增加风险。所以,最终结果为一个全面的维修时间表,形成修周期,因而增加风险。所以

32、,最终结果为一个全面的维修时间表,形成的目的是降低操作风险,确保运行的灵敏性和有效性达到可接受的水平。的目的是降低操作风险,确保运行的灵敏性和有效性达到可接受的水平。15n the two most common fault modeling techniques are called failure modes and effects analysis and fault tree analysis. these techniques are just ways of finding problems and of making plans to cope with failures. n

33、两种最常见的事故建模方法为:故障类型及影响分析和故障树分析。这些两种最常见的事故建模方法为:故障类型及影响分析和故障树分析。这些方法用来发现问题,并且制定出应付事故的计划。方法用来发现问题,并且制定出应付事故的计划。16n in the technique known as failure modes and effects analysis, an engineer starts with a block diagram of a system. the safety engineer then considers what happens if each block of the diag

34、ram fails. the engineer then draws up a table in which failures are paired with their effects and an evaluation of the effects. the design of the system is then corrected, and the table adjusted until the system is not known to have unacceptable problems. of course, the engineers may make mistakes.

35、its very helpful to have several engineers review the failure modes and effects analysis.n 在故障类型及影响分析方法中,工程师首先绘制出系统的结构图,然后,在故障类型及影响分析方法中,工程师首先绘制出系统的结构图,然后,安全工程师考虑如果图中每个模块发生故障时将会发生什么情况。接着,安全工程师考虑如果图中每个模块发生故障时将会发生什么情况。接着,工程师绘制出表格,显示事故类型及其造成的影响以及对影响的评价。然工程师绘制出表格,显示事故类型及其造成的影响以及对影响的评价。然后,对系统的设计进行修正,表格也进

36、行调整直到系统不在出现不可接受后,对系统的设计进行修正,表格也进行调整直到系统不在出现不可接受的问题。当然,工程师也可能犯错误。因此,让几个工程师来检查事故模的问题。当然,工程师也可能犯错误。因此,让几个工程师来检查事故模式和效果分析方法是有益的。式和效果分析方法是有益的。17n in the technique known as fault tree analysis, an undesired effect is taken as the root of a tree of logic. then, each situation that could cause that effect i

37、s added to the tree as a series of logic expressions. when fault trees are labelled with actual numbers about failure probabilities, which are often in practice unavailable because of the expense of testing, computer programs can calculate failure probabilities from fault trees.n 在事故树分析方法中,不希望发生的事件(

38、顶上事件)作为逻辑树的根部。在事故树分析方法中,不希望发生的事件(顶上事件)作为逻辑树的根部。然后,引起顶上事件的每一种情况作为一系列逻辑表达式加入到事故树中。然后,引起顶上事件的每一种情况作为一系列逻辑表达式加入到事故树中。但事故树上表明(基本)事件概率的实际值时(通常,在实际中难以得到但事故树上表明(基本)事件概率的实际值时(通常,在实际中难以得到概率值,由于测试费用昂贵),计算机程序从事故树中能够计算(顶上事概率值,由于测试费用昂贵),计算机程序从事故树中能够计算(顶上事件)事件概率。件)事件概率。18n the classic program is the idaho national

39、 engineering and environmental laboratorys saphire, which is used by the u.s. government to evaluate the safety and reliability of nuclear reactors, the space shuttle, and the international space station.n 典型的程序为爱达荷州国立工程和环境实验室所开发的典型的程序为爱达荷州国立工程和环境实验室所开发的saphire。该软。该软件包,被美国政府用来评价核反应堆、航天飞机、国际太空站的安全性及件

40、包,被美国政府用来评价核反应堆、航天飞机、国际太空站的安全性及可靠性。可靠性。 19n unified modeling language (uml) activity diagrams have been used as graphical components in a fault tree analysis.n 用通用建模语言形成的活动图表用来作为事故树分析的图形部件。用通用建模语言形成的活动图表用来作为事故树分析的图形部件。20n usually a failure in safety-certified systems is acceptable if less than one l

41、ife per 30 years of operation (109 seconds) is lost to mechanical failure. most western nuclear reactors, medical equipment, and commercial aircraft are certified to this level .n 通常,在对安全进行鉴定的系统,如果在每通常,在对安全进行鉴定的系统,如果在每30年运行中,不到一个人由于年运行中,不到一个人由于机械故障而丧生,那么该事件是可接受到。大多数西方核反应堆,医疗设机械故障而丧生,那么该事件是可接受到。大多数西方

42、核反应堆,医疗设备和商用飞机通过认证达到这种水平。备和商用飞机通过认证达到这种水平。21n (美国美国)国家航空和宇宙航行局(国家航空和宇宙航行局(national aeronautics and space administration)的一个图表显示了全体宇航人员幸存和航空飞机中多余)的一个图表显示了全体宇航人员幸存和航空飞机中多余设备数量之间的关系(设备数量之间的关系(mm:飞行舱)。:飞行舱)。 n a nasa graph shows the relationship between the survival of a crew of astronauts and the amoun

43、t of redundant equipment in their spacecraft (the mm, mission module)22n once a failure mode is identified, it can usually be prevented entirely by adding extra equipment to the system. for example, nuclear reactors emit(释放释放) dangerous radiation(放射线放射线) and contain nasty(严重的严重的) poisons, and nuclea

44、r reactions can cause so much heat that no substance can contain them. therefore reactors have emergency core cooling systems to keep the temperature down, shielding(防护的防护的) to contain the radiation, and containments (usually several, nested(嵌套的嵌套的) to prevent leakage.containmentsa structure or syst

45、em designed to prevent the accidental release of radioactive materials from a reactor.防泄漏系统:一种为防止反应堆中放射性物质的事故性泄漏而设计的防泄漏系统:一种为防止反应堆中放射性物质的事故性泄漏而设计的装置或系统装置或系统23n once a failure mode is identified, it can usually be prevented entirely by adding extra equipment to the system. for example, nuclear reactors emit(释放释放) dangerous radiation(放射线放射线) and contain nasty(严重的严重的) poisons, and nuclear reactions can cause so much heat that no substance can contain them. therefore reactors have emergency core cooling systems to keep the temperature down, shielding(防护的


