Deadlock detection in Oracle 11gR2.
In the alert.log of an 11.2.0.1 Database instance we see the following message:
2010-08-20 09:10:55.936000 +02:00
Global Enqueue Services Deadlock detected. More info in file
/usr/local/oracle/diag/rdbms/drp01/DRP011/trace/DRP011_lmd0_17765.trc.
2010-08-20 09:11:05.820000 +02:00
Global Enqueue Services Deadlock detected. More info in file
/usr/local/oracle/diag/rdbms/drp01/DRP011/trace/DRP011_lmd0_17765.trc.
If we open the reported LMD* trace file we can find the following table which provide a lot of information to focus on.
Global Wait-For-Graph(WFG) at ddTS[0.3cd0] :
BLOCKED 0x53833b040 5 wq 2 cvtops x1 TX 0x7c000f.0x42f [1D000-0001-0000006A] inst 1
BLOCKER 0x547d33988 5 wq 1 cvtops x28 TX 0x7c000f.0x42f[1D000-0003-00000571] inst 3
BLOCKED 0x53833b200 5 wq 2 cvtops x1 TX 0x2d0021.0x30b [1D000-0003-00000571] inst 3
BLOCKER 0x547daf928 5 wq 1 cvtops x28 TX 0x2d0021.0x30b[1D000-0001-0000006A] inst 1
So here we can see that a session on instance 1 with XID 1D000-0001-0000006A is waiting for 1D000-0001-0000006A on instance 3. And also instance 3 XID 1D000-0003-00000571 is waiting on 1D000-0001-0000006A and there is our deadlock.
We can also see that the mode used is 5, which is an exclusive lock.
cvt/held mode:
#define KJUSERNL 0 /* no permissions */ (Null)
#define KJUSERCR 1 /* concurrent read */ (Row-S (SS))
#define KJUSERCW 2 /* concurrent write */ (Row-X (SX))
#define KJUSERPR 3 /* protected read */ (Share)
#define KJUSERPW 4 /* protected write */ (S/Row-X (SSX))
#define KJUSEREX 5 /* exclusive access */ (Exclusive)
In the tracefile, but depend on the Oracle released used, we must also be able to find information about the SQL statement, user and programme being used. This will help the developers to find the possible cause in the code to fix the issue. To find for 11gR2 the Blocker from the above example search on the hexvalue 0x547d33988 and-or 0x547daf928 to get the additional information.
This will show the following information:
user session for deadlock lock 0x547daf928
sid: 4295 ser: 292 audsid: 273539 user: 99/TF_TFV flags: 0xE5
pid: 29 O/S info: user: oracle, term: UNKNOWN, ospid: 17853
image: oracle@nldbp01 (S000)
client details:
O/S info: user: applvora, term: pts/3, ospid: 14845
machine: nlxdbp01
program: sqlplus@nldbp01 (TNS V
application name: SQL*Plus, hash value=3669949024
current SQL:
UPDATE SALE SET AKTIE = 'D' , PRIORITEIT = APM.F_APM_GET_VALUE_NUM ('PRIORITEIT', 'HANDMATIGE_INVOER') , DATUM_TIJD_DELETED = SYSDATE WHERE DEAL_ID = :B1
This is indicating sqlplus is used, code executed the user and host.
This information is useful for the developers to validate the cause.
Deadlock detection in Oracle 10gR2.
Here is another example but know for Oracle 10g, you will notice the Wait Graph looks different as in Oracle 11gR2.
An example from 10.2.0.4 alert.log
Thu Nov 24 09:02:23 2006
Global Enqueue Services Deadlock detected. More info in file
/home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc.
Thu Nov 24 09:09:25 2006
Incremental checkpoint up to RBA [0x136.a5fb.0], current log tail at RBA [0x136.f4cd.0]
Thu Nov 24 09:36:09 2006
GES: Potential blocker (pid=25026) on resource TM-0x9fbe-0x0;
enqueue info in file /home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc and DIAG trace file
Thu Nov 24 09:38:10 2006
GES: Potential blocker (pid=20638) on resource TM-0x9fbe-0x0;
enqueue info in file /home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc and DIAG trace file
Thu Nov 24 09:39:45 2006
Incremental checkpoint up to RBA [0x136.53b1f.0], current log tail at RBA [0x136.75627.0]
Thu Nov 24 09:40:12 2006
GES: Potential blocker (pid=18854) on resource TM-0x9fbe-0x0;
enqueue info in file /home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc and DIAG trace file
Thu Nov 24 09:42:14 2006
GES: Potential blocker (pid=16717) on resource TM-0x9fbe-0x0;
enqueue info in file /home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc and DIAG trace file
Thu Nov 24 09:44:18 2006
If we open the reported LMD* trace file we see the following table which provide a lot of information to focus on.
Global Wait-For-Graph(WFG) at ddTS[0.f4] :
BLOCKED 0x5d061dc0 4 [0x9fbe][0x0],[TM] [197000-0002-0000000A] 1
BLOCKER 0x5d442804 4 [0x9fbe][0x0],[TM] [28A000-0003-00000049] 2
BLOCKED 0x5d442804 4 [0x9fbe][0x0],[TM] [28A000-0003-00000049] 2
BLOCKER 0x5d061dc0 4 [0x9fbe][0x0],[TM] [197000-0002-0000000A] 1
So here we see that a process on node 1 with xid 197000-0002-0000000A is waiting for
TM lock [0x9fbe][0x0],[TM]in mode 4 which is being held by a process with xid 28A000-0003-00000049 on node 2. But also on node 2 xid 28A000-0003-00000049 is waiting on TM lock
[0x9fbe][0x0],[TM]in mode 4 which is being held by a process on node 1 with xid 197000-0002-0000000A
Now we know this, we want to know what program, sql or pl/sql is causing the problems.
The object involved can be found to use the Hex value 0x9fbe which is in decimal 40894. Query dba_objects to find the object involved.
But still this doesn’t show the sql involved. Depending on the Oracle Release and Patchset this can be different to find. Latest release report a part of the SQL, but you can also search your udump and bdump directory on the nodes involved. Use the PID to search through the tracefiles. This will give you the sql used.
In case you can’t find the tracefile or the traces are removed, you can set the ora-60 event on every instance in the cluster.
event="60 trace name errorstack level 3;name systemstate level 10"
*LMD is a background process also called the Global enqueue services Daemon, taking care of the enqueues (locking).
|