一套Oracle 11g r2 RAC环境,发现文件系统使用率很高,因为ASM alert log 中在不间断的输出”Attempting voting file refresh on diskgroup xx”信息,RBAL进程似乎因为PST问题一直在尝试,导致内存溢出,最终可能会报出Ora-7445等异常,最终ASM实例 CRASH, 这里简单的记录处理方法。
ASM Alert log
Wed May 15 17:16:31 2024 NOTE: Attempting voting file refresh on diskgroup OCRVOTE NOTE: Refresh completed on diskgroup OCRVOTE . Found 3 voting file(s). NOTE: Voting file relocation is required in diskgroup OCRVOTE NOTE: Attempting voting file relocation on diskgroup OCRVOTE NOTE: Successful voting file relocation on diskgroup OCRVOTE NOTE: Attempting voting file refresh on diskgroup OCRVOTE NOTE: Refresh completed on diskgroup OCRVOTE
RBAL Trace file
2024-05-15 12:11:48.014: [ CSSCLNT]clsssVoteDiskFormat: call clsscfgfmtbegin with leasedata 0000000000000000, size 0 2024-05-15 12:11:48.014: [ CSSCLNT]clsssVoteDiskFormat: succ-ly format the Voting Disk src:9ffffffffd55ff30:c000000061bd1950:3: /dev/rdisk/disk500:7cd1ebdbd0f94fa5bfa9cd93375b2f2c: PST-old:9ffffffffd55fe60:c000000061bd1250:47: /dev/rdisk/disk505:4703259dd3fa4f9abfa41dbf5bae835c: PST-old:9ffffffffd55ff30:c000000061bd1950:47: /dev/rdisk/disk500:7cd1ebdbd0f94fa5bfa9cd93375b2f2c: Reg-old:9ffffffffd55fec8:c000000061bd15d0:7: /dev/rdisk/disk506:6eea4d1c63e74f53bfcb35f30531b5e5:
查看RBAL进程内存
# linux pmap -p xxx # aix svmon -P xxx -O segment=category
最终实例CRASH ORA-07445 [lpmloadpkg()+160]
*** 2024-05-15 12:15:00.779 Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x1A60] [PC:0x40000000112205E0, lpmloadpkg()+160] [flags: 0x0, count: 1] [CPU: 2] Incident 200785 created, dump file: /grid/app/diag/asm/+asm/+ASM1/incident/incdir_200785/+ASM1_rbal_3847_i200785.trc ORA-07445: exception encountered: core dump [lpmloadpkg()+160] [SIGSEGV] [ADDR:0x1A60] [PC:0x40000000112205E0] [Address not mapped to object] []
根据MOS ASM Alert Logs Show Continuously: Attempting Voting File Relocation (Doc ID 1457886.1) 记录存在 bug 13609187和bug 13904435.
解决方法
1, move PST
alter diskgroup GRID drop disk [disk_name] rebalance power 0; alter diskgroup GRID undrop disks;
— or —
2, manaul rebalance diskgroup
上面的步骤如果不放心,也可以先迁移ocr和votedisk到其它diskgroup.如下
-- 替换OCR
[root@11g-node2 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[root@11g-node2 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3108
Available space (kbytes) : 259012
ID : 299475515
Device/File Name : +OCRVOTE
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
[root@11g-node2 ~]# su - grid
[grid@11g-node2 ~]$ asmcmd lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 4194304 51200 9588 0 9588 0 N DATA/
MOUNTED EXTERN N 512 4096 4194304 51200 43004 0 43004 0 N FRA/
MOUNTED NORMAL N 512 4096 4194304 15360 14320 5120 4600 0 Y OCRVOTE/
[grid@11g-node2 ~]$
[root@11g-node2 ~]# ocrconfig -add +data
[root@11g-node2 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3108
Available space (kbytes) : 259012
ID : 299475515
Device/File Name : +OCRVOTE
Device/File integrity check succeeded
Device/File Name : +data
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
[root@11g-node2 ~]# ocrconfig -delete +OCRVOTE
[root@11g-node2 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3108
Available space (kbytes) : 259012
ID : 299475515
Device/File Name : +data
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
-- 替换VOTE DISKS
[root@11g-node2 ~]# crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 23b9e942beda4f52bf4d4dded62fca50 (/dev/sdd) [OCRVOTE]
2. ONLINE e0451324eb724f65bfc1238c2ed85e5c (/dev/sdc) [OCRVOTE]
3. ONLINE bb094d8715424f50bfe2d34fba6f7f48 (/dev/sdb) [OCRVOTE]
Located 3 voting disk(s).
[root@11g-node2 ~]# crsctl replace votedisk +data
Successful addition of voting disk 9b91c6cec30a4fb5bf248d83adaf750e.
Successful deletion of voting disk 23b9e942beda4f52bf4d4dded62fca50.
Successful deletion of voting disk e0451324eb724f65bfc1238c2ed85e5c.
Successful deletion of voting disk bb094d8715424f50bfe2d34fba6f7f48.
Successfully replaced voting disk group with +data.
CRS-4266: Voting file(s) successfully replaced
[root@11g-node2 ~]# crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 9b91c6cec30a4fb5bf248d83adaf750e (/dev/sdf) [DATA]
Located 1 voting disk(s).
[root@11g-node2 ~]# su - grid
[grid@11g-node2 ~]$ asmcmd
ASMCMD>
ASMCMD> lsdg -g
Inst_ID State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
2 MOUNTED EXTERN N 512 4096 4194304 51200 9288 0 9288 0 Y DATA/
1 MOUNTED EXTERN N 512 4096 4194304 51200 9288 0 9288 0 Y DATA/
2 MOUNTED EXTERN N 512 4096 4194304 51200 43004 0 43004 0 N FRA/
1 MOUNTED EXTERN N 512 4096 4194304 51200 43004 0 43004 0 N FRA/
2 MOUNTED NORMAL N 512 4096 4194304 15360 14416 5120 4648 0 N OCRVOTE/
1 MOUNTED NORMAL N 512 4096 4194304 15360 14416 5120 4648 0 N OCRVOTE/
##### 手动rebalance diskgroup
ASMCMD> rebal --power 5 ocr
Rebal on progress.
ASMCMD> lsop
Group_Name Dsk_Num State Power EST_WORK EST_RATE EST_TIME
OCRVOTE REBAL REAP 5 9
ASMCMD> lsop
Group_Name Dsk_Num State Power EST_WORK EST_RATE EST_TIME
# 恢复到原DISKGROUP OCRVOTE
[root@11g-node2 ~]# crsctl replace votedisk +OCRVOTE
Successful addition of voting disk 11e7fa5187684fcebfea09a9383fa244.
Successful addition of voting disk b021f532e0164ff3bf874b6a3147ff3b.
Successful addition of voting disk c7a28692b8934f1fbf4e86ec65341b3b.
Successful deletion of voting disk 9b91c6cec30a4fb5bf248d83adaf750e.
Successfully replaced voting disk group with +OCRVOTE.
CRS-4266: Voting file(s) successfully replaced
[root@11g-node2 ~]# crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 11e7fa5187684fcebfea09a9383fa244 (/dev/sdd) [OCRVOTE]
2. ONLINE b021f532e0164ff3bf874b6a3147ff3b (/dev/sdc) [OCRVOTE]
3. ONLINE c7a28692b8934f1fbf4e86ec65341b3b (/dev/sdb) [OCRVOTE]
Located 3 voting disk(s).
[root@11g-node2 ~]# ocrconfig -add +OCRVOTE
[root@11g-node2 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3108
Available space (kbytes) : 259012
ID : 299475515
Device/File Name : +data
Device/File integrity check succeeded
Device/File Name : +OCRVOTE
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
[root@11g-node2 ~]# ocrconfig -delete +data
[root@11g-node2 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3108
Available space (kbytes) : 259012
ID : 299475515
Device/File Name : +OCRVOTE
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
注意如果把OCRVOTE的OCR Votedisk移走后再操作,alert 日志会提示如下信息
SUCCESS: rebalance completed for group 3/0x6af84273 (OCRVOTE) NOTE: Attempting voting file refresh on diskgroup OCRVOTE NOTE: Refresh completed on diskgroup OCRVOTE. No voting file found.
放心该提示可以安全的忽略,主要是因为diskgroup没有voting disks, 也是non-published bug:14279847