首页 » ORACLE 9i-23ai » How stop Oracle ASM diskgroup REBALANCE force? (ORA-15067)

How stop Oracle ASM diskgroup REBALANCE force? (ORA-15067)

最近某电力行业客户Oracle数据库反馈I/O 慢,查看 gv$asm_operation现在存在几块DROPING状态的OFFLINE的disk,正在做reblance(power 1),业务高峰期DISK BUSY 接近100% ,因为是SATA HDD,DB业务性能受到影响,且该ASM DISKGROUP 100TB以上RBLA预计要2天多的时间, 有没有办法停止当前的RBLA呢?

启用ASM TRACE跟踪EVENT

alter system set events ‘15195 trace name context forever,level 7’;

检查ASM操作

select * from gv$asm_operation;

asm_power_limit

-- all ASM instance
SQL> alter diskgroup xxx reblance power 0;

Note:
但这个环境的ASM Disk修改power=0报错: ORA-15067: command or option incompatible with diskgroup redundancy

强制停止

-- ASM instance
alter system set event='15195 trace name context forever, level 604' scope=spfile; 
alter system set asm_power_limit = 0 scope=spfile;
-- 重启ASM instance
select * from gv$asm_operation;  --查看正在RB的ASM实例,重启该实例
shut abort
startup

Note:
正常关闭ASM 正在RB的ASM实例,p重启后会ASM RB POWER自动恢复到1 ,并RB会切换到其它实例,配置event 15195 为了阻止COD(Continuing Operations Directory)恢复. 注意启用event后会产生较多的trace,注意清理.

MOS 13728745.8 记录

Set event 15195 level 604 in all ASM instances.  This will prevent COD recovery to run and may get the diskgroup mounted.

好友有记录过Oracle ASM Virtually addressed metadata- Continuing Operations Directory

Continuing Operations Directory简称COD,是ASM的4号文件,该文件的作用是记录一些持续性的操作,当操作意外终止时,可以利用COD来实现继续完成或者回滚,如果说ACD是ASM实例的redo的话,那么COD就是ASM实例的undo。COD的持续性操作类型分为Background operation和Rollback operation。

所谓Background operation就是由ASM实例后台进程发起的操作,由COD的0号block记录,所以0号block也称为COD BackGround Operations block,最经典的例子就是rebalance操作,当ASM实例在rebalance操作未完成时crash,或者磁盘组意外dismount,那么当磁盘组重新mount之后,COD BackGround Operations block会告诉ASM实例继续完成rebalance操作,直至完成;

所有ASM 实例 power=0后,并且正在rb的ASM配置了15195 event后,再重启就不会在RB.

ORA-15067

但直接使用alter diskgroup xxx reblance power 0怎么会报ORA-15067错误呢? 当前ASM DISKGROUP 为high 冗余级别.
检查ASM diskgroup

select failgroup,STATE,MODE_STATUS,header_status,count(*),sum(free_mb),sum(os_mb)  from v$asm_disk 
where group_number = (select group_number from v$asm_diskgroup where name='xxxxxxxxx')
group by failgroup,STATE,MODE_STATUS,header_status;

select path, header_status, os_mb,free_mb, failgroup from v$asm_disk
 where group_number = (select group_number from v$asm_diskgroup where name='xxxx') order by failgroup;

Note:
当前ASM DISKGROUP 存在3个failgroup, 并且有3块正在dropping的offline的ASM DISK。 online正常是可以操作的。

ORA-15067主要是因为当前的ASM DISKGROUP的FAILGROUP 个数不满足ASM 冗余级别的要求。因为对于High 冗余的ASM DISKGROUP ,ASM 要求至少有5个Failgroup, 当前只有3个,所以才会提示该错误,据了解该环境原来有规划quorum disk,猜测是因为其它原因该ASM DISKGROUP的2个quorum disk failgroup缺失了。

— over —

打赏

,

目前这篇文章有1条评论(Rss)评论关闭。

  1. duşakabin | #1
    2023-07-31 at 01:19

    very informative articles or reviews at this time.