案例集合移动ngesop db2故障分析_第1页
案例集合移动ngesop db2故障分析_第2页
案例集合移动ngesop db2故障分析_第3页
案例集合移动ngesop db2故障分析_第4页
案例集合移动ngesop db2故障分析_第5页
已阅读5页,还剩10页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

TOC\o"1-2"\h\z\u问题综 原因分 告警日志中ORA-28分析 PMON进程异常分析 小 问题综2016年10月05日02:00~02:30节点2存在大量的cursor:pinSwaitonX等待,该部分等待会话由librarycachelock等待所阻塞,此期间数据库出现大量的ORA-28错07:00左右数据库1522和1526端口出现阻塞,客户端无法通过以上端口连接09:282PMONDEAD因而执行shutdownabort原因分cursor:pinSwaitonX从节点2的trace分析,发现cursor:pinSwaitonX是被librarycachelock等待所阻OracleOraclesessionidentified{instance:2(ngesopdb.ngesopdb2)osid:processid:2816,oracle@NGESOP-DB2sessionid:42sessionserial#:} withwait{p1:'idn'=0x921c4772p2:'value'=0x3cp3:'where'=0xtimeinwait:13.542162sectimeoutafter:neverwaitid:blocking:0 TARIFFITEMID,STANDARDTARIFF,PRICE,RECDATE,STARTDATE,ENDDATE,OLDENDDATE,OPRSOURCE,OLDPACKAGEPRODID) (:ORDERID,:SUBSPRODID,:REGION,:SUBSID,:OPERTYPE,:PRODID,:PACKAGEID,:PACKAGEPRODID,:TARIFFITEMID,shortstack:ksedsts()+360<-ksdxfstk()+44<-ksdxcb() <- <- wait*timebetweencurrentwaitandwait#1:0.000131event:'SQL*Netmessagetimewaited:0.001138'waitid: p1:'driverwaitingfor'cursor:pinSwaitonp2:*timebetweenwait#1and#2:0.000005 event:'SQL*Netmessageto timewaited:0.000001secwaitid:538 p1:'driverid'=0xp2:'#bytes'=0x1*timebetweenwait#2and#3:0.000515 event:'SQL*Netmessagefrom timewaited:0.001049secwaitid:537 p1:'driverid'=0xp2:'#bytes'=0x1}andisblocked=>Oraclesessionidentified{instance:2(ngesopdb.ngesopdb2)osid:processid:1885,oracle@NGESOP-DB2sessionid:15366sessionserial#:}whichiswaitingwhichiswaitingfor'librarycache{p1:'handle p2:'lockaddress'=0x p3:'100*mode+namespace'=0x8b2bctimeinwait:14.813948sectimeoutafter:neverwaitid:blocking:193 TARIFFITEMID,STANDARDTARIFF,PRICE,RECDATE,STARTDATE,ENDDATE,OLDENDDATE,OPRSOURCE,OLDPACKAGEPRODID) (:ORDERID,:SUBSPRODID,:REGION,:SUBSID,:OPERTYPE,:PRODID,:PACKAGEID,:PACKAGEPRODID,:TARIFFITEMID, wait*timebetweencurrentwaitandwait#1:0.000168 event:'cursor:mutexX'timewaited:0.000145secwaitid:520 p1:'idn'=0x921c4772p2:'value'=0x0p3:timebetweenwait#1and#2:0.001097 event:'librarycachelock'timewaited:0.020083secwaitid:519 p1:'handleaddress'=0x p2:'lockaddress'=0x p3:'100*mode+namespace'=0x520003timebetweenwait#2and#3:0.000030 event:'cursor:mutexX'timewaited:0.000016secwaitid:518 p1:'idn'=0x921c4772p2:'value'=0x22eap3:p3:}对ASH数据进行分析可以看出,两节点存在大量的librarycachelock等待,节点1其中一个执行SQL6qm21qztp23kp等待librarycachelock9079显示被等待rowcachelock5795579520791579520791正在执行相同的SQL02:20ASH信息作为采样截取如下:-2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-rowcache2016-10-1librarycache2016-10-1librarycache2016-10-librarycache2016-10-1librarycache2016-10-1 librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1dbfilescattered2016-10-2librarycache2016-10-2librarycache2016-10-2librarycache2016-10-2librarycache2016-10-2librarycache2016-10-2librarycache2016-10-2librarycache2016-10-2librarycache2016-10-2librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-rowcache2016-10-1librarycache2016-10-librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1cache2016-10-1cache2016-10-1cache2016-10-1cache2016-10-1cache2016-10-1cache2016-10-1cache2016-10-1cache2016-10-1cache2016-10-1dbscattered2016-10-1cache2016-10-2cache2016-10-2cache2016-10-2cache2016-10-2cache2016-10-2cache2016-10-2cache2016-10-2cache2016-10-2cache2016-10-2cache2016-10-1cache2016-10-1cache2016-10-1cache2016-10-1cache2016-10-1cache2016-10-1cache2016-10-1cache2016-10-rowcache2016-10-1librarycache2016-10-1librarycache2016-10-librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1librarycache2016-10-1dbfilescattered2016-10-1librarycache2016-10-2librarycache2016-10-2librarycache2016-10-2librarycache2016-10-2librarycache2016-10-2librarycache2016-10-2librarycache2016-10-2librarycache2016-10-2librarycache2016-10-2librarycache1执行SQL9k1bjzuswn1xx20791ASH114dbfile -DB2(TNSV1-14rowcache -DB2(TNS10librarysqlplus@NGESOP-DB1V1-两条SQLSQL_IDSQL_IDcreatetabletmp_tongfuasselect/*+parallel(a,4)*//*query*/distincta.servnumber,did,a.recdatefromgdyy.cs_rec_orderaleftjoinmgsj2.esop_cm_subs_subscribera1ona.servnumber=a1.servnumberjoingdyy.cs_rec_productbona.orderid=b.orderidandb.enddate<>b.startdateand(b.enddate>=sysdateorb.enddateisnull)wherea.isrollback='0'andnotexists(select'x'fromgdyy.cs_rec_orderbackbwherea.orderid=b.oldorderid)anda.statusin('3','4')andb.opertype='N'andto_char(a.recdate)>='2015-01-0100:00:00'didin('prod.75300000620630','prod.75300000620639')SQL_IDBegingather_utili.gather_table_stats(‘GDYY’,6);会话20791来自于主机 SQLUSERID425MZ_ESOPLDQ_ESOPROS90791本机,正在进OracleOraclesessionidentified{instance:1(ngesopdb.ngesopdb1)osid:processid:1335,oracle@NGESOP-DB1(TNSV1-V3)sessionid:9079sessionserial#:}iswaitingfor'librarycachelock'withwait{p1:'handleaddress'=0x p2:'lockaddress'=0x p3:'100*mode+namespace'=0x8b2bctimeinwait:27min26sectimeoutafter:neverwaitid:blocking:0currentsql:BEGINGATHER_UTILI.gather_table_stats('GDYY',6); <- wait从yze的trace *timebetweencurrentwaitandwait#1:0.013143 event:'directpathreadtemp'timewaited:0.052656secwaitid:518579p1:'filenumber'=0x7e0p2:'firstdba'=0x50c80p3:'blockcnt'=0x10timebetweenwait#1and#2:0.013229 event:'directpathreadtemp'timewaited:0.011157secwaitid:518578 p1:'filenumber'=0x7e0p2:'firstdba'=0x2bf85dp3:'blockcnt'=0x1timebetweenwait#2and#3:0.003366 event:'directpathreadtemp'timewaited:0.039378secwaitid:518577p1:'filenumber'=0x7e0p2:'firstdba'=0x2325dp3:'blockcnt'=0x1}andisblocked=>Oraclesessionidentified{instance:1(ngesopdb.ngesopdb1)osid:processid:931,oracle@NGESOP-DB1sessionid:5795sessionserial#:}whichiswaitingfor'rowcachelock'withwait{p1:'cacheid'=0x8p2:'mode'=0x0p3:'request'=0x3timeinwait:1.465587secheur.timeinwait:31min56timeoutafter:1.534413secwaitid:677blocking:1currentsql:createtabletmp_tongfuasselect/*+parallel(a,4)*//*query*/fromgdyy.cs_rec_orderaleftjoinmgsj2.esop_cm_subs_subscribera1onjoingdyy.cs_rec_productbona.orderid=b.orderidandb.enddate<>b.startdateand(b.enddate>=sysdateorb.enddateis wait*timebetweencurrentwaitandwait#1:0.000053 event:'rowcachelock'timewaited:3.000017secwaitid:676 p1:'cacheid'=0x8p2:'mode'=0x0p3:timebetweenwait#1and#2:0.000039 event:'rowcachelock'timewaited:3.000019secwaitid:675 p1:'cacheid'=0x8p2:'mode'=0x0p3:'request'=0x3timebetweenwait#2and#3:0.000038 event:'rowcachelock'timewaited:3.000019secwaitid:674 p1:'cacheid'=0x8p2:'mode'=0x0p3:'request'=0x3}andisblocked=>Oraclesessionidentified{instance:1(ngesopdb.ngesopdb1)osid:processid:894,oracle@NGESOP-DB1sessionid:20791sessionserial#:}whichisnotina{lastwait:0.000000secagoblocking:2sessionscurrentsql:createtabletmp_tongfuasselect/*+parallel(a,4)*//*query*/fromgdyy.cs_rec_orderaleftjoinmgsj2.esop_cm_subs_subscribera1onjoingdyy.cs_rec_productbona.orderid=b.orderidandb.enddate<>b.startdateand(b.enddate>=sysdateorb.enddateis wait event:'dbfilescatteredread'timewaited:0.024455secwaitid:155883 p1:'file#'=0x130p2:'block#'=0x180980p3:'blocks'=0x80timebetweenwait#1and#2:0.000185 event:'gccrmultiblockrequest'timewaited:0.001022secwaitid:155882 p1:'file#'=0x130p2:'block#'=0x1809ffp3:'class#'=0x1timebetweenwait#2and#3:0.000596 event:'gccrmultiblockrequest'timewaited:0.000678secwaitid:155881 p1:'file#'=0x130p2:'block#'=0x18098ep3:p3:}通过节点1librarycachelock等待会话的信息和trace中记录的信息发现,librarycachelockGDYY.CS_REC_PRODUCTSQL9k1bjzuswn1xx中涉及的对象。综合以上信息,DG-DB2和DGNG3-DB2两台主机通过MZ_ESOPLDQ_ESOPROS用户同时发起了两条相同的SQL,一条正常运行,一条被阻塞,被阻塞的会话进而阻塞统计信息收话,导致统计信息收话长时间等待librarycachelock等待,并其他执行涉GDYY.CS_REC_PRODUCT表的SQL的会话解析过程中出现librarycachelock和cursorpinSwaitonX等待。ALERTORA-28,ORA-28是指会话WedWedOct0502:10:01createtabletp_bboss_resendasselect/*DM_GZ__*/*fromgdyy.om_wl_worklistAwhererecdate>trunc(sySdate-1)andstatus<>'4'ANDSEND_COUNT<>'0'andcreate_date<sysdate-5/24/60andworkitemidinDROPTABLEWedOct0502:10:04opiodrabortingprocessunknownospidWedOct0502:10:05opiodrabortingprocessunknownospidWedOct0502:10:06opiodrabortingprocessunknownospidWedOct0502:10:07opiodrabortingprocessunknownospidWedOct0502:10:08opiodrabortingprocessunknownospidWedOct0502:10:09opiodrabortingprocessunknownospidWedOct0502:10:17opiodrabortingprocessunknownospidWedOct0502:10:18opiodrabortingprocessunknownospidWedOct0502:10:19opiodrabortingprocessunknownospidWedOct0502:10:20opiodrabortingprocessunknownospid(WedOct0502:10:212016)asaresultofORA-)asaresultofORA-)asaresultofORA-)asaresultofORA-)asaresultofORA-)asaresultofORA-)asaresultofORA-)asaresultofORA-)asaresultofORA-)asaresultofORA-opiodrabortingprocessunknownospid(853356)asaresultofORA-WedOct0502:10:22opiodrabortingprocessunknownospidWedOct0502:10:23opiodrabortingprocessunknownospidWedOct0502:10:24)asaresultofORA-)asaresultofORA-opiodrabortingprocessunknownospid(853784)asaresultofORA-28WedOct0502:10:252016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:10:262016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:10:272016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:10:282016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:10:292016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:10:302016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:10:312016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:10:322016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:10:332016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:10:342016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:032016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:042016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:052016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:062016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:072016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:082016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:092016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:102016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:112016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:122016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:132016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:142016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:152016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:162016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:172016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:182016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:192016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:202016opiodrabortingprocessunknownospid( )asaresultofORA-28WedOct0502:12:212016opiodropiodrabortingprocessunknownospidWedOct0502:12:22opiodrabortingprocessunknownospidWedOct0502:12:23opiodrabortingprocessunknownospid(WedOct0502:12:242016opiodrabortingprocessunknownospidWedOct0502:15:56opiodrabortingprocessunknownospidWedOct0502:15:57opiodrabortingprocessunknownospidWedOct0502:15:58opiodrabortingprocessunknownospidWedOct0502:15:59opiodrabortingprocessunknownospidWedOct0502:16:00opiodrabortingprocessunknownospidWedOct0502:16:01)asaresultofORA-)asaresultofORA-)asaresultofORA-)asaresultofORA-)asaresultofORA-)asaresultofORA-)asaresultofORA-)asaresultofORA-)asaresultofORA-ORA-28[oracle@desktop9[oracle@desktop9~]$oerrora00028,00000,"yoursessionhasbeen//*Cause:Aprivilegeduserhaskilledyoursessionandyouareno loggedontothe//*Action:LoginagainifyouwishtocontinueORA-28指的是会话被杀除,说明从2:10左右开始,数据库存在大量的清理会话活动PMON从9点27分左右Alert日志中发现 2进程作系统进程号为 WedOct0509:27:102 )waitsforlatch'cachebufferschains'for85Errorsinfile/oracle/arch02/esopdb2_diag/diag/rdbms/ngesopdb/ngesopdb2/trace/ngesopdb2_lmhb_ ORA-29771:processUSER)2)formorethan70<----2 details USER(ospid: )isblocking2(ospid: )inawaitLMHB(ospid: )killsUSER(ospid: PleasecheckLMHBtracefileformoredetail.WedOct0509:28:222016Errorsinfile/oracle/arch02/esopdb2_diag/diag/rdbms/ngesopdb/ngesopdb2/trace/ngesopdb2_dia0_ ORA-32701:PossiblehangsuptohangID=0 details DIA0terminatingblocker sid:9610ser#:1299)ofhangwithID=requestedbymasterDIA0processoninstanceHangResolutionReason:Automatichangresolutionwasperformedto criticaldatabaseprocess.byterminatingsessionsid:9610DIA0successfullyterminatedsessionsid:9610ospid: withstatus31.WedOct0509:28:3120162 )waitsforlatch'cachebufferschains'for166 LMHBalreadyattemtpedtokillUSER(ospid: PleasecheckLMHB )tracefileformoreErrorsinfile/oracle/arch02/esopdb2_diag/diag/rdbms/ngesopdb/ngesopdb2/trace/ngesopdb2_lmhb_ ORA-29770:globalenqueue 2 )ishungformorethan70 ERROR:Someprocess(s)isnotmakingprogress.LMHB(ospid: )isterminatingtheinstance.PleasecheckLMHBtracefileformoredetails.PleasealsochecktheCPUload,I/OloadandothersystempropertiesforanomalousbehaviorERROR:Someprocess(s)isnotmakingprogress.LMHB(ospid: ):terminatingtheinstanceduetoerror29770WedOct0509:28:322016Systemstatedumprequestedby(instance=2,osid= (LMHB)),summary=[abnormalinstancetermination]. systemstatedumpWedOct0509:28:33ORA-1092:opitskabortingWedOct0509:28:34数据库认为PMONORA-1092:opitsk数据库认为PMONPerformingimplicitshutdownabortduetodeadPMONLicensehighwatermark=5812WedOct0509:28:35ORA-1092:opitskabortingprocessWedOct0509:28:352016ORA-1092:opitskabortingprocessWedOct0509:28:352016ORA-1092:opitskabortingprocessWedOct0509:28:352016ORA-1092:opitskabortingprocessWedOct0509:28:352016ORA-1092:opitskabortingprocessInstanceterminatedbyLMHB,pidUSER(ospid: ):terminatingtheinstanceInstanceterminatedbyUSER,pid=ORA-1092:opitskabortingprocessWedOct0509:30:302016StartingORACLEinstance从Lmhb的trace中也可以看到,2进程作系统进程号 的会话阻塞,调用pmon进行清理,但一直未成功,2进程因而长时间被阻塞***2016-10-052 )hasnotmovedfor26sec kjfmGCR_HBCheckAll:2 )hasstatus:waitingforevent'latch:cachebufferschains'for25secswith.<----:waitingforlatch'cachebuffers===[WaitChain2 )waitsforevent'latch:cachebuffersUSER )isnotin===[LatchChain2 )waitsforlatch'cachebuffersUSER )isnotwaitingforanyUSER isNOTalive.PostPMONforprocesscleanuppmon***2016-10-052(ospid: )hasnotmovedfor60sec( kjfmGCR_HBCheckAll:2(ospid: )hasstatus2:waitingforevent'latch:cachebufferschains'for146secswith.<----:waitingforlatch'cachebuffers===[WaitChain2 )waitsforevent'latch:cachebuffersUSER )isnotin===[LatchChain2 )waitsforlatch'cachebuffersUSER )isnotwaitingforanyUSER isNOTalive.PostPMONforprocesscleanup<--------pmonkjgcr_Main:KJGCR_ACTION-id从09:26:50的trace中发现操作系统进程号为 状态为DEAD,已36分51秒无活SO:SO:fc85ad8,type:2,owner:0x0,flag:INIT/-/-/0x00if:0x3c:fc85ad8,name=process,file=ksu.hLINE:12616ID:,(process)Oraclepid:1722,ser:3,callscur/top:flags:(0x0)flags2:(0x0),flags3:intrerror:0,callerror:0,sesserror:0,txnerrorintrqueue:emptyksudlpFALSEatlocation:0(postinfo)lastpostreceived:00lastpostreceived-location:kji.hLINE:3419ID:kjata:wakeupenqueueownerlastprocesstopostme: c024b816lastpostsent:00lastpostsent-location:kcb2.hLINE:4139ID:kcbzwwlastprocesspostedbyme: d060f82440(latchinfo)wait_event=0 cb7b10a0Childcachebufferschainslevel=1Locationfromwherelatchisheld:kcb2.hLINE:3406ID:kcbgtcr:fastpath(crContextsavedfromstate=busy(shared)state=busy(shared)[value=0x4000000000000001]waiters[orapid(secondssince:putonlist,post

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论