Oracle数据库环境尤其是RAC环境对下层的基础环境要求非常严格,常常会因为CPU不足,内存不足、网络,IO等原因导致数据库hang或脑裂驱逐, 这里如果没有系统信息数据的支撑, 可能会陷入SA和DBA互相扯皮的尴尬局面, Oracle也是操碎了心,引入了个工具CHM, 这也是Oracle安装介质越来越大的一个原因, 把这些工具也集成了进来,到12C后甚至还搞了个独立的库(GIMR)收信信息,还有SQL developer, SQLCL , ORATOP 等工具, 这也是正是她伟大的一面.
ora.crf是Cluster Health Monitor(以下简称CHM)提供服务的资源, 用来自动收集操作系统(CPU、内存、SWAP、进程、I/O以及网络等)的使用情况。 当然还是建议生产环境部署OSW来收集记录更久的信息, OSW是调用的OS 命令,而CHM 调用的是OS API,开销低、实时性强。 开始的版本是每秒一次,从11.2.0.3应该是变为每5秒一次。
CHM会自动安装在下面的软件:
- 11.2.0.2 及更高版本的 Oracle Grid Infrastructure for Linux (不包括Linux Itanium) 、Solaris (Sparc 64 和 x86-64)
- 11.2.0.3 及更高版本 Oracle Grid Infrastructure for AIX 、 Windows (不包括Windows Itanium)。
之前的版本如果想安装CHM,需要独立安装。也可以安装在非RAC环境。
CHM主要包括两个服务:
1). System Monitor Service(osysmond):这个服务在所有节点都会运行,osysmond会将每个节点的资源使用情况发送给cluster logger service,后者将会把所有节点的信息都接收并保存到CHM的资料库。
$ ps -ef|grep osysmond root 7984 1 0 Jun05 ? 01:16:14 /u01/app/11.2.0/grid/bin/osysmond.bin
2). Cluster Logger Service(ologgerd):在一个集群中的,ologgerd 会有一个主机点(master),还有一个备节点(standby)。当ologgerd在当前的节点遇到问题无法启动后,它会在备用节点启用。
主节点: $ ps -ef|grep ologgerd root 8257 1 0 Jun05 ? 00:38:26 /u01/app/11.2.0/grid/bin/ologgerd -M -d /u01/app/11.2.0/grid/crf/db/rac2 备节点: $ ps -ef|grep ologgerd root 8353 1 0 Jun05 ? 00:18:47 /u01/app/11.2.0/grid/bin/ologgerd -m rac2 -r -d /u01/app/11.2.0/grid/crf/db/rac1
CHM诊断日志:
$GRID_HOME/log/*/crflogd/crflogd.log $GRID_HOME/log/*/crfmond/crfmond.log
CHM Repository:
用于存放收集到数据,默认情况下,会存在于$GI_HOME/crf下 ,需要1 GB 的磁盘空间, 每个节点大约每天会占用0.5GB的空间, 您可以使用OCLUMON来调整它的存放路径以及允许的空间大小(最多只能保存3天的数据)
获得CHM生成的数据的方法有两种:
1. 一种是使用Grid_home/bin/diagcollection.pl
$/bin/diagcollection.pl -collect -chmos -incidenttime inc_time -incidentduration duration e.g. $ diagcollection.pl -collect -crshome /u01/app/11.2.0/grid -chmoshome /u01/app/11.2.0/grid -chmos -incidenttime "06/15/201412:30:00" -incidentduration "00:05"
2. 另外一种获得CHM生成的数据的方法为oclumon
$oclumon dumpnodeview [[-allnodes] | [-n node1 node2] [-last "duration"] | [-s "time_stamp" -e "time_stamp"] [-v] [-warning]] [-h] e.g. $ oclumon dumpnodeview -allnodes -v -s "2012-06-15 07:40:00" -e "2012-06-15 07:57:00" > /tmp/chm1.txt Using oclumon to detect potential root causes for node evictions ( CPU starvation ) $ oclumon dumpnodeview -n grac2 -last "00:15:00"
停止和禁用 ora.crf resource.
On each node, as root user: # /bin/crsctl stop res ora.crf -init # /bin/crsctl modify res ora.crf -attr ENABLED=0 -init
第一个问题
因为CHM Repository文件高大导致GI_HOME文件系统使用率高或ologgerd进程CPU使用高 几乎100%. 通常每个节点大约每天会占用0.5GB的空间。
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 824 root RT -5 368m 142m 58m R 99.6 0.1 1:01.77 ologgerd # cd $GI_HOME/crf/db/ # ls -lstr *.bdb
像我本次处理的案例crfclust.bdb 文件达到了37GB.
清理方法:
A,手动清理 1. Stop CRF, as root user. # $GI_HOME/bin/crsctl stop res ora.crf -init 2. Take backup or remove bdb files. # cd $GI_HOME/crf/db/ # mv *.bdb *.bdb.backup 3. Start CRF, as root user. # $GI_HOME/bin/crsctl start res ora.crf -init B,11.2.0.3修改一种使用oclumon manage -repos 可以控制大小释放空间。 下面的命令用来查看它当前设置: $ oclumon manage -get reppath CHM Repository Path = /u01/app/11.2.0/grid/crf/db/rac2 Done $ oclumon manage -get repsize CHM Repository Size = 259200 <====单位为秒 Done 有时会得到一个非常大的值.那是有问题的,如 $ oclumon manage -get repsize CHM Repository Size = 1094795585 修改路径: $ oclumon manage -repos reploc /shared/oracle/chm 修改大小: $ oclumon manage -repos resize 259200 # Restart crf on both nodes $ crsctl stop res ora.crf -init $ crsctl start res ora.crf -init .bdb文件也会重新初始化。 C, 另一种临时解决方法,可以kill ologgerd进程,清理目录,osysmond会自动respawn ologgerd和生成新的bdb文件。 Bug 20186278 crfclust.bdb Becomes Huge Size Due to Sudden Retention Change Bug 13950866 Disk usage is 100% due to ora.crf ressource
第二个问题
ora.crf 是online状态,但是get reppath报错
$ oclumon manage -get reppath CRS-9011-Error manage: Failed to initialize connection to the Cluster Logger Service But "status & target" of resource ora.crf is online on all the node : crsctl stat res ora.crf -init NAME=ora.crf TYPE=ora.crf.type TARGET=ONLINE STATE=ONLINE on dibarac01
BUG 17238613 – LNX64-11204-CHM:OLOGGERD WAS DISABLED BECAUSE BDB GROWN BEYOND DESIRED LIMITS
BUG 20439706 – DB_KEYEXIST: KEY/DATA PAIR ALREADY EXISTS ERROR IN CRFLOGD.LOG
BUG 18447164 – CRFCLUST.BDB GROW HUGE SIZE
BUG 19692024 – EXADATA: CRFCLUST.BDB IS GROWING TO 40 GB
BUG 20127477 – CRFCLUST.BDB HAS GROWN UNEXPECTEDLY
BUG 20127477 – CRFCLUST.BDB HAS GROWN UNEXPECTEDLY
BUG 20316849 – HUGE REPSIZE RESULTING IN GI HOME DIRECTORY FILLING UP
BUG 20351845 – RETENTION FOR CHM DATA IS SET TO 34YRS.All of those are closed as duplicate of the following:
BUG 20186278 – TAG OCR: GET ID FAILED AND CHM DB SIZE 24 GB
第三个问题
12.1 引入的diagsnap 也是有CHM管理, 在osysmond执行pstack时存在问题,同样会导致实例重启或节点驱逐。
解决方法:
1, Disable osysmond from issuing pstack: As root user, issue crsctl stop res ora.crf -init Update PSTACK=DISABLE in $GRID_HOME/crf/admin/crf.ora crsctl start res ora.crf -init 2. disable diagsnap. As GI user, issue $GI_HOME/bin/oclumon manage -disable diagsnap
第四个问题
ora.crf 启动失败’Secondary index corrupt: not consistent with primary’ in log//crflogd/crflogd.log.
解决方法
手动重建 BDB databases , 方法和问题一处理一样, 不如需要注意一个单词叫“Secondary index” 第二个索引或叫次要索引, 索引什么时候有了顺序? 这点有点意思,因为在Oracle 19c中这个Secondary index还有更大的用途。 下一篇我们说说19c 中的Secondary index.
分析CHM 的shell(来自互联网)
#!/bin/bash
# Description:
# Convert CHM files to more human readable format like vmstat, ....
# - move the MEM Low and CPU high message to the end of the line
# - diplay data in a tabular format
#
# Usage : ./print_sys.sh grac41_CHMOS
# grac41_CHMOS = oclumon output from : tfactl diagcollect
#
# Run a report for System Metrics from 16.01.00 - 16.01.59
# % ~/print_sys.sh grac41_CHMOS | egrep '#pcpus|cpuq:|03-22-14 10.00'
# Output
# pcpus: 2 #vcpus: 2 cpuht: N chipname: Intel(R) swaptotal: 5210108 physmemtotal: 4354292 #sysfdlimit: 6815744 #disks: 27 #nics: 6
# cpu: cpuq: memfree: mcache: swapfree: ior: iow: ios: swpin: swpout: pgin: pgout: netr: netw: procs: rtprocs: #fds: nicErrors:
# 03-22-14 10.00.03 2.60 6 86356 215692 1811240 16 1 11 6 0 17 1 41 7 378 15 19648 0
# 03-22-14 10.00.13 5.27 1 89492 224720 1785120 8444 8528 166 2764 3414 4437 3497 41 12 381 15 19680 0
# 03-22-14 10.00.18 5.87 1 96180 227256 1776196 7682 5508 534 2004 2400 3762 2524 47 10 388 15 19712 0
# ..
#
# ...
echo "-> File searched: " $1
# echo "-> Search Str 1 : " $2
# pcpus indicates a SYSTEM Metric report
search1="pcpus"
#
# remove any ; from each line - simplifies processing
cat $1 | sed 's/;/ /g' | sed 's/'\''//g' | awk 'BEGIN { cnt=0; }
/Node:/ { Node=$0; Nodet1=$4; Nodet2=$5; }
/'$search1'/ {
# printf("%s \n", $1 );
if ( $1=="#pcpus:" )
{
if ( cnt==0 )
{
cnt++;
# print header: number of CPUs and Chip Identidy
printf ("%s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s \n", \
$1, $2, $3, $4, $5, $6, $7, $8, $21, $22, $15, $16, $47, $48, $49, $50, $51, $52);
printf (" cpu: cpuq: memfree: mcache: swapfree: ior: iow: ios: swpin:");
printf (" swpout: pgin: pgout: netr: netw: procs: rtprocs: #fds: nicErrors: \n" );
}
cnt++;
for (i = 1; i <= NF; i++)
{
if ($i=="cpu:" )
{
# printf ("%s ", $(i+1) );
memlow = "";
cpu=$(i+1);
i++;
}
else if ($i=="cpuq:" )
{
# printf ("%s ", $(i+1) );
cpuq=$(i+1);
i++;
}
else if ($i=="physmemfree:" )
{
physmemfree=$(i+1);
i++;
}
else if ($i=="mcache:" )
{
mcache=$(i+1);
i++;
}
else if ($i=="swapfree:" )
{
swapfree=$(i+1);
i++;
}
else if ($i=="ior:" )
{
ior=$(i+1);
i++;
}
else if ($i=="iow:" )
{
iow=$(i+1);
i++;
}
else if ($i=="ios:" )
{
ios=$(i+1);
i++;
}
else if ($i=="swpin:" )
{
swpin=$(i+1);
i++;
}
else if ($i=="swpout:" )
{
swpout=$(i+1);
i++;
}
else if ($i=="pgin:" )
{
pgin=$(i+1);
i++;
}
else if ($i=="pgout:" )
{
pgout=$(i+1);
i++;
}
else if ($i=="netr:" )
{
netr=$(i+1);
i++;
}
else if ($i=="netw:" )
{
netw=$(i+1);
i++;
}
else if ($i=="procs:" )
{
procs=$(i+1);
i++;
}
else if ($i=="rtprocs:" )
{
rtprocs=$(i+1);
i++;
}
else if ($i=="#fds:" )
{
fds=$(i+1);
i++;
}
else if ($i=="nicErrors:" )
{
nicErrors=$(i+1);
i++;
}
else if ($i== "total-mem")
{
# Record detection for LOW memory indication
# Available memory (physmemfree 91516 KB + swapfree 185276 KB) on node grac41 is Too Low (< 10% of total-mem + total-swap)
# Search for total-mem and select i-2 field which is 10% is the above case
#
memlow = $(i-2);
# printf(" **** MEM low: < %s *** " , $(i-2) ); } } printf ("%s %s %6s %3d %9s %9s %9s %5s %5s %5s %5s %5s %5s %5s %5d %5d %5d %5d %5d %5d ", \ Nodet1, Nodet2, cpu, cpuq, physmemfree, mcache, swapfree, ior, iow, ios, swpin, swpout, pgin,pgout, netr, netw, procs, rtprocs, fds, nicErrors ); if ( cpu > 90 )
{
# Record detection for HIGH CPU usage indication
printf (" CPU > 90% ");
}
if ( memlow != "" )
{
printf(" MEMLOW < %s", memlow);
}
printf("\n");
# printf("%s \n", $1 );
# printf("%s %s %6s %3s %10s %10s %10s %5s %5s %5s %5s %5s %5s %5s %8s %8s %5s %5s %5s %5s \n", \
# Nodet1, Nodet2, $10, $12, $14, $18, $20, $24,$26,$28, $30, $32, $34 , $36, $38, $40, $42, $44, $46, $54 );
}
} '