Although Oracle is every time improving this process. DIAG background process is dumping more and more information to make analyzing each. But still you need to do some work. I hope this article can help you with that part.
If we look for an example we see from an alert.log:
Thu Nov 24 09:02:23 2006
Global Enqueue Services Deadlock detected. More info in file
/home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc.
Thu Nov 24 09:09:25 2006
Incremental checkpoint up to RBA [0x136.a5fb.0], current log tail at RBA [0x136.f4cd.0]
Thu Nov 24 09:36:09 2006
GES: Potential blocker (pid=25026) on resource TM-0x9fbe-0x0;
enqueue info in file /home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc and DIAG trace file
Thu Nov 24 09:38:10 2006
GES: Potential blocker (pid=20638) on resource TM-0x9fbe-0x0;
enqueue info in file /home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc and DIAG trace file
Thu Nov 24 09:39:45 2006
Incremental checkpoint up to RBA [0x136.53b1f.0], current log tail at RBA [0x136.75627.0]
Thu Nov 24 09:40:12 2006
GES: Potential blocker (pid=18854) on resource TM-0x9fbe-0x0;
enqueue info in file /home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc and DIAG trace file
Thu Nov 24 09:42:14 2006
GES: Potential blocker (pid=16717) on resource TM-0x9fbe-0x0;
enqueue info in file /home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc and DIAG trace file
Thu Nov 24 09:44:18 2006
If we open the reported LMD* trace file we see the following table which provide a lot of information to focus on.
Global Wait-For-Graph(WFG) at ddTS[0.f4] :
BLOCKED 0x5d061dc0 4 [0x9fbe][0x0],[TM] [197000-0002-0000000A] 1
BLOCKER 0x5d442804 4 [0x9fbe][0x0],[TM] [28A000-0003-00000049] 2
BLOCKED 0x5d442804 4 [0x9fbe][0x0],[TM] [28A000-0003-00000049] 2
BLOCKER 0x5d061dc0 4 [0x9fbe][0x0],[TM] [197000-0002-0000000A] 1
These values are:
cvt/held mode:
#define KJUSERNL 0 /* no permissions */ (Null)
#define KJUSERCR 1 /* concurrent read */ (Row-S (SS))
#define KJUSERCW 2 /* concurrent write */ (Row-X (SX))
#define KJUSERPR 3 /* protected read */ (Share)
#define KJUSERPW 4 /* protected write */ (S/Row-X (SSX))
#define KJUSEREX 5 /* exclusive access */ (Exclusive)
So here we see that a process on node 1 with xid 197000-0002-0000000A is waiting for
TM lock [0x9fbe][0x0],[TM]in mode 4 which is being held by a process with xid 28A000-0003-00000049 on node 2.
But also on node 2 xid 28A000-0003-00000049 is waiting on TM lock
[0x9fbe][0x0],[TM]in mode 4 which is being held by a process on node 1 with xid 197000-0002-0000000A
Now we know this, we want to know what program, sql or pl/sql is causing the problems.
The object involved can be found to use the Hex value 0x9fbe which is in decimal 40894. Query dba_objects to find the object involved.
But still this doesn’t show the sql involved. Depending on the Oracle Release and Patchset this can be different to find. Latest release report a part of the SQL, but you can also search your udump and bdump directory on the nodes involved. Use the PID to search trough the tracefiles. This will give you the sql used.
In case you can’t find the tracefile or the traces are removed, you can set the ora-60 event on every instance in the cluster.
event="60 trace name errorstack level 3;name systemstate level 10"
*LMD is a background process also called the Global enqueue services Daemon, taking care of the enqueues (locking).