Troubleshooting “DFS lock handle” wait event

a few days ago,My CRM DB face to a performance issue. DB is very slow. then I check DB wait event of active session ,Most sessions is waiting “DFS lock handel” , ‘DFS lock handle’ can cause massive performance issues in a busy RAC cluster. my db is 10R2 2NODE RAC ON AIX.

Riyaj Shamsudeen Said

DFS (stands for Distributed File System) is an ancient name, associated with cluster file system operations, in a Lock manager supplied by vendors in Oracle Parallel Server Environment (prior name for RAC). But, this wait event has morphed and is now associated with waits irrelevant to database files also. ”

How does it work?

I have no access to the code, so read this paragraph with caution, as I may have misunderstood my test results: A process trying to acquire a lock on a global GES resource sends a AST(Asynchronous Trap) or BAST (Blocking Asynchronous Trap) message to LCK process, constructing the message with (lock pointer, resource pointer, and resource name) information. If the resource is not available, then the LCK process sends a message to the lock holder for a lock downgrade.

Now, let’s imagine that we need to ask remote background process to do some operations, say, DBWR to do an object checkpoint in a remote instance. Further, the requestor must wait for the background process to complete the task. What better way is there to implement these waits, than using a lock? It seems that acquiring a lock on a specific resource in a specific mode, triggers a predefined action in that background process. For example, acquiring locks on a specific CI resource triggers the DBWR to object level checkpoint (details later).

While the background process is working, the requesting process is instrumented to wait on the wait event ‘DFS lock handle’, in essence, the requesting process is waiting for one or more background process to perform a specific action. BTW, in some cases, background processes also can wait for ‘DFS lock handle’. Excessive waits by the foreground or background process for the DFS lock handle wait event can lead to instance freeze or application freeze.

Every resource has a value block and can be used to send more information about the resource, such as object_id for a sequence, object_id for a table partition etc.

How to diag

During a period of wait, use the following SQL statement to retrieve the name of the lock and the mode of the lock request:

select chr(bitand(p1,-16777216)/16777215) || chr(bitand(p1, 16711680)/65535) “Lock”,
to_char(bitand(p1, 65536)) “Mode”,
p2, p3 , seconds_in_wait
from v$session_wait
where event = ‘DFS lock handle’

Then use the following to identify the requesting session and the blocking session:

select inst_id, sid, type, id1, id2, lmode, request, block
from gv$lock
where type=’CI’ and id1=9 and id2=5

In this example, use the values from the first query in the where clause. type=Lock, id1=p1 and id2=p2.

DFS lock handle P1 P2 P3 parameter

P1
1128857605 转换16进制,  如0x43490005 0x43=C 0x49=I 为CI MODE=5

P2
The following can be used as reference for understanding the id1 for CI call:

1 reuse (checkpoint and invalidate block range)
2 LGWR Checkpointing and Hot Backup
3 DBWR syncronization of SGA with control file
4 log file add/drop/rename notification
5 miscellaneous CTWR operations
7 Manged standby recovery related
8 alter rollback segment optimal
9 Signal Query Servers/coordinator
10 Create Remote parallel query Server
11 Set Global Partitions
12 Stop Disk Writes
13 Drop Sort Segments
14 Release unused space from Sort Segments
15 instance Recovery for Parallel operation Group
16 Validate parallel slave Lock Value
17 check transaction state objects
18 object reuse request
19 rolling release checks
20 propagate begin backup scn for a file
21 refresh top plan (for db scheduler
22 clear checkpoint progress record
23 drop temp file
24 QUiesce database Restricted
25 Update Dscn Tracking (ktcndt
26 Purge dictionary Object number Cache
27 set Database Force Logging mode
28 invalidate cached file address translations
29 Cursor Unauthorize Mode
30 process waiters after row cache requeue
31 Active Change Directory extent relocation
32 block change tracking state change
33 kgl mulitversion obsolete
34 set previous resetlogs data
35 set recovery destination pointer
36 fast object reuse request
38 ASM diskgroup discovery wait
39 ASM diskgroup release
40 ASM push DB updates
41 ASM add ACD chunk
42 ASM map resize message
43 ASM map lock message
44 ASM map unlock message (phase 1
45 ASM map unlock message (phase 2
46 ASM generate add disk redo marker
d47 ASM check of PST validity
48 ASM offline disk CIC
49 Logical Standby Sync Point SCN
50 update SQL Tuning Base existence bitvector
51 PQ induced Checkpointing
52 ASM F1X0 relocation
53 Scheduler autostart
54 KZS increment grant/revoke counter
55 ASM disk operation message
56 ASM I/O error emulation
57 DB Supp log cursor invalidation
58 Cache global range invalidation
59 Cache global object invalidation
60 ASM Pre-Existing Extent Lock wait
61 Perform a ksk action through DBWR
62 ASM diskgroup refresh wait
63 KCBO object checkpoint
64 KCBO object pq checkpoint
65 global health check event
66 Oracle Label Security refresh
67 thread internal enable
68 cross-instance registration
69 KGL purge unused subheaps
70 clear pin instance flag
71 Rolling operations CIC
73 readable stby: metadata flush
74 slave dskm message
75 update sysaux state sga variable
76 notify storage server reg/dereg
77 ASM ACD relocation xtnt info
78 Misc LogMiner CIC
101 ASM disk error emulation
102 Audit Management operations
103 invalidate sql profile cursors
104 Misc RMAN CIC
105 standby role change
106 Voting file refresh CIC
107 LGWR miscellaneous
108 Invalidate ASM spfile xmap CIC
109 load new system statistics
110 sync datafile move state
111 KSB Rolling Migration Test1
112 ASM map lock message for AVD
113 Oracle Label Security install
114 Oracle Label Security enforce
115 Oracle Database Vault config
117 Invalidate DB Props cache CIC
118 OBSOLETE
119 Propagate final ODPS
120 Post for cleanup of Audit
121 Assist offline from the DB side
122 Assist offline from the ASM side
123 Nuke ILM file stats
124 12.1 RM CIC in RMON

P3
The following can be used to understand id2:
1 used to pass in parameters
2 used to invoke the function in backgroud process
3 used to indicate the foreground has not returned
4 mounted excl, use to allocate mechanism
5 used to queue up interested clients

CASE 1
note: In this article I used Tanel Poder expert ‘s SQL scripts package

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for HPUX: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production

SQL> @a
A-Script: Display active sessions...

  COUNT(*) SQL_ID        STATE   EVENT
---------- ------------- ------- ----------------------------------------------------------------
        88 9w5uw0jh30vtz WAITING DFS lock handle
        63 a22wxf47tpjb1 WAITING DFS lock handle
        49 c57w8qqawbhwy WAITING DFS lock handle
        34 1yhkd9t3w21ub WAITING DFS lock handle
        16 7wqbd0wkxb3p8 WAITING DFS lock handle
        15 b88q4nv18t8zx WAITING DFS lock handle
         9 3kzcw83n08t0g WAITING DFS lock handle
         8 g22mvfwwbty7f WAITING DFS lock handle
         7 2t3505ydschdc WAITING DFS lock handle
         7 8d9pjq8by13nb WAITING DFS lock handle
         7               WAITING log file sync
         5 5buc1nvhfbzfb WAITING DFS lock handle
         5 9hf9fgxvpxnhm WAITING DFS lock handle
         5 9k04jggqzshs7 ON CPU  On CPU / runqueue
         4 b88q4nv18t8zx WAITING enq: TX - row lock contention
         4 fsyu9fuanadtv WAITING DFS lock handle
         4 5n6vwvrgtqh86 WAITING DFS lock handle
		 
SQL> @xi 9w5uw0jh30vtz %
eXplain the execution plan for sqlid 9w5uw0jh30vtz child %...

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
SQL_ID  9w5uw0jh30vtz, child number 0

INSERT INTO "CO_ORDER_ITEM_TEMP_T" (             "ORDER_ITEM_ID" ,
"CUST_ORDER_ID" , "ORDER_ITEM_CD" ,   "CUST_WORKSHEET_ID" , "STATUS_CD"
, "STATUS_DATE" ,   "STATUS_CHANGE_REASON" , "PRIORITY" ,
"PRE_HANDLE_FLAG" ,   "HANDLE_TIME" , "ARCHIVE_DATE" , "FINISH_TIME" ,
 "ORDER_ITEM_OBJ_ID" , "INSTALL_TIME_SEGMENT_ID" ,
"BEGIN_TIME_SEGMENT_ID" , "SERVICE_OFFER_ID" , "REASON" ,
"EXT_ORDER_ITEM_ID" , "LAN_ID" , "ACCEPT_CITY" , "CITY_CODE" ,
"EFFECT_TIME" , "ACCEPT_REASON" , "BOOK_DATE" ,   "BOOKING_OPEN_DATE",
"ORDER_GROUP_ID" ,   "PARENT_GROUP_ID" ,
"STANDARD_OBJ_ID","ACCEPT_GROUP_ID",
"ORDER_ITEM_TYPE","TIME_NAME","EXT_ORDER_ITEM_GROUP_ID"           )
VALUES (             :1 ,           :2 ,           :3 ,           :4 ,
         :5 ,           :6 ,           :7 ,           :8 ,           :9
,           :10 ,           :11 ,           :12 ,           :13 ,
    :14 ,           :15 ,           :16 ,           :17 ,           :18
,           :19 ,           :20 ,           :2

SQL> @snapper ash 5 1 all
Sampling SID all with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.11 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)


----------------------------------------------------------------------------------------------------
Active% | INST | SQL_ID          | SQL_CHILD | EVENT                               | WAIT_CLASS
----------------------------------------------------------------------------------------------------
  9600% |    1 | 9w5uw0jh30vtz   | 0         | DFS lock handle                     | Other
  5300% |    1 | gn1vcj4rx7b1t   | 1         | ON CPU                              | ON CPU
  5200% |    1 | a22wxf47tpjb1   | 0         | DFS lock handle                     | Other
  5000% |    1 | c57w8qqawbhwy   | 0         | DFS lock handle                     | Other
  3000% |    1 | 9w5uw0jh30vtz   | 0         | gc buffer busy                      | Cluster
  3000% |    1 |                 |           | latch: library cache                | Concurrency
  2900% |    1 | 9k04jggqzshs7   | 0         | ON CPU                              | ON CPU
  2700% |    1 | 9hf9fgxvpxnhm   | 1         | DFS lock handle                     | Other
  2100% |    1 | 7wqbd0wkxb3p8   | 0         | DFS lock handle                     | Other
  1700% |    1 | b88q4nv18t8zx   | 2         | DFS lock handle                     | Other
  
SQL> select username,machine,program,p1,p2,p3,p1text,p2text,sql_id,event from v$session where event='DFS lock handle';
USERNAME        MACHINE      PROGRAM                   P1         P2         P3 P1TEXT       P2TEXT    SQL_ID        EVENT
--------------- ------------ ----------------- ---------- ---------- ---------- ------------ --------- ------------- --------------------
MY_ANBOB_COM     HOSXXXXXX   JDBC Thin Client  1398145029    4252665          0 type|mode    id1       a22wxf47tpjb1 DFS lock handle
MY_ANBOB_COM     HOSXXXXXX   JDBC Thin Client  1398145029    4252665          0 type|mode    id1       9w5uw0jh30vtz DFS lock handle
MY_ANBOB_COM     HOSXXXXXX   JDBC Thin Client  1398145029    4252665          0 type|mode    id1       7wqbd0wkxb3p8 DFS lock handle
MY_ANBOB_COM     HOSXXXXXX   JDBC Thin Client  1398145029    4252665          0 type|mode    id1       a22wxf47tpjb1 DFS lock handle
MY_ANBOB_COM     HOSXXXXXX   JDBC Thin Client  1398145029    4252665          0 type|mode    id1       c57w8qqawbhwy DFS lock handle
MY_ANBOB_COM     HOSXXXXXX   JDBC Thin Client  1398145029    4252665          0 type|mode    id1       9hf9fgxvpxnhm DFS lock handle
MY_ANBOB_COM     HOSXXXXXX   JDBC Thin Client  1398145029    4252665          0 type|mode    id1       3kzcw83n08t0g DFS lock handle
MY_ANBOB_COM     HOSXXXXXX   JDBC Thin Client  1398145029    4252665          0 type|mode    id1       9w5uw0jh30vtz DFS lock handle
...
MY_ANBOB_COM     HOSXXXXXX   JDBC Thin Client  1398145029    4252665          0 type|mode    id1       9w5uw0jh30vtz DFS lock handle
MY_ANBOB_COM     HOSXXXXXX   JDBC Thin Client  1398145029    4252665          0 type|mode    id1       9w5uw0jh30vtz DFS lock handle
MY_ANBOB_COM     HOSXXXXXX   JDBC Thin Client  1398145029    4252665          0 type|mode    id1       a22wxf47tpjb1 DFS lock handle

已选择472行。

# identify the type of lock associated with this event.

SQL>  select chr(bitand(&&p1,-16777216)/16777215) ||
  2  chr(bitand(&&p1,16711680)/65535) type,
  3  mod(&&p1, 16) md
  4  from dual
  5  ;
输入 p1 的值:  1398145029

TY         MD
-- ----------
SV          5

SQL> select * from v$lock_type where type='SV';
TYPE       NAME                  ID1_TAG         ID2_TAG      IS_  DESCRIPTION
---------- --------------------- ------------- ------------ ------ ------------------ 
SV         Sequence Ordering     object #       0            NO   Lock to ensure ordered sequenc

# identified by a resource name

SQL> @oid 4252665 

owner                     object_name                    object_type        SUBOBJECT_NAME                 CREATED           LAST_DDL_TIME     status    DATA_OBJECT_ID
------------------------- ------------------------------ ------------------ ------------------------------ ----------------- ----------------- --------- --------------
WWW_ANBOB_COM             SEQ_XX_VERSION_S               SEQUENCE                                          20141204 01:31:51 20141210 20:06:56 VALID

SQL> select * from dba_sequences where sequence_name='SEQ_XX_VERSION_S';

SEQUENCE_OWNER                 SEQUENCE_NAME                   MIN_VALUE  MAX_VALUE INCREMENT_BY C O CACHE_SIZE LAST_NUMBER
------------------------------ ------------------------------ ---------- ---------- ------------ - - ---------- -----------
WWW_ANBOB_COM                  SEQ_XX_VERSION_S                        1 1.0000E+14            1 N Y         20   168011961

NOTE:
the sequence “SEQ_XX_VERSION_S” cache size 20 and with ORDER attribute, if the sequences are accessed in different instances. To accommodate this requirement, RAC code must ensure that next value retrieved from the sequence will be in an ORDER in any instance and the co-ordination mechanism is by transferring couple of GES messages between the instances.

SQL> alter sequence WWW_ANBOB_COM .SEQ_XX_VERSION_S cache 1000 noorder;
序列已更改。

Tip:
then the solution probably would be that, ORDER attribute is not a suitable attribute for that heavily accessed sequence. My recommendation is always been (a) Use bigger cache >1000 for the sequences frequently accessed

SQL> @a
A-Script: Display active sessions...

  COUNT(*) SQL_ID        STATE   EVENT
---------- ------------- ------- ----------------------------------------------------------------
        12               WAITING log file sync
        11 5xhp3dsuv0zf8 WAITING gc buffer busy
         8 8h67q133fayyj WAITING gc buffer busy
         8 5sw4j8f1hk5xc WAITING gc buffer busy
         7 dm6azg0mj90gp WAITING gc buffer busy
		 ...

wait a moment, DFS lock handle wait event is gone. The database performance problems solved.

References Riyaj Shamsudeen ‘s article.