Company

Advanced Search
Home | General Database | Platform | Articles | Scripts | Online Documentation
Back

Articles

Date 2007-08-02 23:01:45
Component RAC
Title How to analyze global deadlocks ?
Version 10.2.0.1
Problem

Global deadlock can be compared with “normal” deadlock the difference is the fact that in RAC environment a deadlock is called a Global deadlock as the locking issue can be on a different instance in the cluster.

 

Deadlocks normally occur when two or more session are both holding and requesting the same resources. In a non-RAC environment an ora-60 is reported. In a RAC environment a message is reported in the alert.log.

 

Global Enqueue Services Deadlock detected.

 

The fact that this is reported in nice, but how to analyze the transactions involved so the problem area can be solved?

Solution

Although Oracle is every time improving this process. DIAG background process is dumping more and more information to make analyzing each. But still you need to do some work. I hope this article can help you with that part.

 

If we look for an example we see from an alert.log:

 

Thu Nov 24 09:02:23 2006

Global Enqueue Services Deadlock detected. More info in file

 /home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc.

Thu Nov 24 09:09:25 2006

Incremental checkpoint up to RBA [0x136.a5fb.0], current log tail at RBA [0x136.f4cd.0]

Thu Nov 24 09:36:09 2006

GES: Potential blocker (pid=25026) on resource TM-0x9fbe-0x0;

 enqueue info in file /home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc and DIAG trace file

Thu Nov 24 09:38:10 2006

GES: Potential blocker (pid=20638) on resource TM-0x9fbe-0x0;

 enqueue info in file /home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc and DIAG trace file

Thu Nov 24 09:39:45 2006

Incremental checkpoint up to RBA [0x136.53b1f.0], current log tail at RBA [0x136.75627.0]

Thu Nov 24 09:40:12 2006

GES: Potential blocker (pid=18854) on resource TM-0x9fbe-0x0;

 enqueue info in file /home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc and DIAG trace file

Thu Nov 24 09:42:14 2006

GES: Potential blocker (pid=16717) on resource TM-0x9fbe-0x0;

 enqueue info in file /home/oracle/product/oracle/DB/admin/rt01cdb/bdump/cdb011_lmd0_16660.trc and DIAG trace file

Thu Nov 24 09:44:18 2006

 

If we open the reported LMD* trace file we see the following table which provide a lot of information to focus on.

 

Global Wait-For-Graph(WFG) at ddTS[0.f4] :

BLOCKED 0x5d061dc0 4 [0x9fbe][0x0],[TM] [197000-0002-0000000A] 1

BLOCKER 0x5d442804 4 [0x9fbe][0x0],[TM] [28A000-0003-00000049] 2

BLOCKED 0x5d442804 4 [0x9fbe][0x0],[TM] [28A000-0003-00000049] 2

BLOCKER 0x5d061dc0 4 [0x9fbe][0x0],[TM] [197000-0002-0000000A] 1

 

These values are:

 

cvt/held mode:

 

#define KJUSERNL 0          /* no permissions */   (Null)

#define KJUSERCR 1          /* concurrent read */  (Row-S (SS))

#define KJUSERCW 2         /* concurrent write */  (Row-X (SX))

#define KJUSERPR 3         /* protected read */    (Share)

#define KJUSERPW 4          /* protected write */  (S/Row-X (SSX))

#define KJUSEREX 5         /* exclusive access */  (Exclusive)

 

 

So here we see that a process on node 1 with xid 197000-0002-0000000A is waiting for

 

TM lock [0x9fbe][0x0],[TM]in mode 4 which is being held by a process with xid 28A000-0003-00000049 on node 2.

But also on node 2 xid 28A000-0003-00000049 is waiting on TM lock

[0x9fbe][0x0],[TM]in mode 4 which is being held by a process on node 1 with xid 197000-0002-0000000A

 

Now we know this, we want to know what program, sql or pl/sql is causing the problems.

The object involved can be found to use the Hex value 0x9fbe which is in decimal 40894. Query dba_objects to find the object involved.

 

But still this doesn’t show the sql involved.  Depending on the Oracle Release and Patchset this can be different to find. Latest release report a part of the SQL, but you can also search your udump and bdump directory on the nodes involved. Use the PID to search trough the tracefiles. This will give you the sql used.

  

In case you can’t find the tracefile or the traces are removed, you can set the ora-60 event on every instance in the cluster.

event="60 trace name errorstack level 3;name systemstate level 10"

 

*LMD is a background process also called the Global enqueue services Daemon, taking care of the enqueues (locking).


© RACHelp 2010 | About me | Disclaimers | Contact