说说 ora.crf（CHM）那些事

Oracle数据库环境尤其是RAC环境对下层的基础环境要求非常严格，常常会因为CPU不足，内存不足、网络,IO等原因导致数据库hang或脑裂驱逐，这里如果没有系统信息数据的支撑，可能会陷入SA和DBA互相扯皮的尴尬局面， Oracle也是操碎了心，引入了个工具CHM，这也是Oracle安装介质越来越大的一个原因，把这些工具也集成了进来，到12C后甚至还搞了个独立的库(GIMR)收信信息，还有SQL developer, SQLCL , ORATOP 等工具，这也是正是她伟大的一面.

ora.crf是Cluster Health Monitor（以下简称CHM）提供服务的资源，用来自动收集操作系统（CPU、内存、SWAP、进程、I/O以及网络等）的使用情况。当然还是建议生产环境部署OSW来收集记录更久的信息， OSW是调用的OS 命令，而CHM 调用的是OS API,开销低、实时性强。开始的版本是每秒一次，从11.2.0.3应该是变为每5秒一次。

CHM会自动安装在下面的软件：

11.2.0.2 及更高版本的 Oracle Grid Infrastructure for Linux (不包括Linux Itanium) 、Solaris (Sparc 64 和 x86-64)
11.2.0.3 及更高版本 Oracle Grid Infrastructure for AIX 、 Windows (不包括Windows Itanium)。

之前的版本如果想安装CHM，需要独立安装。也可以安装在非RAC环境。

CHM主要包括两个服务：
1). System Monitor Service(osysmond)：这个服务在所有节点都会运行，osysmond会将每个节点的资源使用情况发送给cluster logger service，后者将会把所有节点的信息都接收并保存到CHM的资料库。

 $ ps -ef|grep osysmond
 root      7984     1  0 Jun05 ?        01:16:14 /u01/app/11.2.0/grid/bin/osysmond.bin

2). Cluster Logger Service(ologgerd)：在一个集群中的，ologgerd 会有一个主机点(master)，还有一个备节点(standby)。当ologgerd在当前的节点遇到问题无法启动后，它会在备用节点启用。

 主节点:
$ ps -ef|grep ologgerd
 root      8257     1  0 Jun05 ?        00:38:26 /u01/app/11.2.0/grid/bin/ologgerd -M -d       /u01/app/11.2.0/grid/crf/db/rac2

备节点：
 $ ps -ef|grep ologgerd
root      8353     1  0 Jun05 ?        00:18:47 /u01/app/11.2.0/grid/bin/ologgerd -m rac2 -r -d /u01/app/11.2.0/grid/crf/db/rac1

CHM诊断日志：

$GRID_HOME/log/*/crflogd/crflogd.log
$GRID_HOME/log/*/crfmond/crfmond.log

CHM Repository：
用于存放收集到数据，默认情况下，会存在于$GI_HOME/crf下，需要1 GB 的磁盘空间，每个节点大约每天会占用0.5GB的空间，您可以使用OCLUMON来调整它的存放路径以及允许的空间大小(最多只能保存3天的数据)

获得CHM生成的数据的方法有两种：
1. 一种是使用Grid_home/bin/diagcollection.pl

$/bin/diagcollection.pl -collect -chmos -incidenttime inc_time -incidentduration duration

e.g. 
$ diagcollection.pl -collect -crshome /u01/app/11.2.0/grid -chmoshome  /u01/app/11.2.0/grid -chmos -incidenttime "06/15/201412:30:00" -incidentduration "00:05"

2. 另外一种获得CHM生成的数据的方法为oclumon

 $oclumon dumpnodeview [[-allnodes] | [-n node1 node2] [-last "duration"] | [-s "time_stamp" -e "time_stamp"] [-v] [-warning]] [-h]

e.g.
$ oclumon dumpnodeview -allnodes -v -s "2012-06-15 07:40:00" -e "2012-06-15 07:57:00" > /tmp/chm1.txt

Using oclumon to detect potential root causes for node evictions ( CPU starvation )
$ oclumon dumpnodeview -n grac2 -last "00:15:00"

停止和禁用 ora.crf resource.

On each node, as root user:
# /bin/crsctl stop res ora.crf -init
# /bin/crsctl modify res ora.crf -attr ENABLED=0 -init

第一个问题
因为CHM Repository文件高大导致GI_HOME文件系统使用率高或ologgerd进程CPU使用高几乎100%. 通常每个节点大约每天会占用0.5GB的空间。

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 824 root      RT  -5  368m 142m  58m R 99.6  0.1   1:01.77 ologgerd

# cd $GI_HOME/crf/db/ 
# ls -lstr *.bdb

像我本次处理的案例crfclust.bdb 文件达到了37GB.

清理方法：

A,手动清理
1. Stop CRF, as root user.
# $GI_HOME/bin/crsctl stop res ora.crf -init 

2. Take backup or remove bdb files.
# cd $GI_HOME/crf/db/ 
# mv *.bdb *.bdb.backup

3. Start CRF, as root user.
# $GI_HOME/bin/crsctl start res ora.crf -init

B，11.2.0.3修改一种使用oclumon manage -repos 可以控制大小释放空间。

下面的命令用来查看它当前设置：
$ oclumon manage -get reppath
  CHM Repository Path = /u01/app/11.2.0/grid/crf/db/rac2
  Done

$ oclumon manage -get repsize
  CHM Repository Size = 259200 <====单位为秒
  Done
  
有时会得到一个非常大的值.那是有问题的，如
$ oclumon manage -get repsize
 CHM Repository Size = 1094795585

修改路径：
$ oclumon manage -repos reploc /shared/oracle/chm

修改大小：	 
$ oclumon manage -repos resize 259200

# Restart crf on both nodes
$ crsctl stop res ora.crf -init
$ crsctl start res ora.crf -init

.bdb文件也会重新初始化。

C， 另一种临时解决方法，可以kill ologgerd进程，清理目录，osysmond会自动respawn ologgerd和生成新的bdb文件。

Bug 20186278  crfclust.bdb Becomes Huge Size Due to Sudden Retention Change
Bug 13950866  Disk usage is 100% due to ora.crf ressource

第二个问题
ora.crf 是online状态，但是get reppath报错

$ oclumon manage -get reppath
CRS-9011-Error manage: Failed to initialize connection to the Cluster Logger Service

But "status & target" of resource ora.crf is online on all the node :
crsctl stat res ora.crf -init
NAME=ora.crf
TYPE=ora.crf.type
TARGET=ONLINE
STATE=ONLINE on dibarac01

BUG 17238613 – LNX64-11204-CHM:OLOGGERD WAS DISABLED BECAUSE BDB GROWN BEYOND DESIRED LIMITS
BUG 20439706 – DB_KEYEXIST: KEY/DATA PAIR ALREADY EXISTS ERROR IN CRFLOGD.LOG
BUG 18447164 – CRFCLUST.BDB GROW HUGE SIZE
BUG 19692024 – EXADATA: CRFCLUST.BDB IS GROWING TO 40 GB
BUG 20127477 – CRFCLUST.BDB HAS GROWN UNEXPECTEDLY
BUG 20127477 – CRFCLUST.BDB HAS GROWN UNEXPECTEDLY
BUG 20316849 – HUGE REPSIZE RESULTING IN GI HOME DIRECTORY FILLING UP
BUG 20351845 – RETENTION FOR CHM DATA IS SET TO 34YRS.

All of those are closed as duplicate of the following:
BUG 20186278 – TAG OCR: GET ID FAILED AND CHM DB SIZE 24 GB

第三个问题
12.1 引入的diagsnap 也是有CHM管理，在osysmond执行pstack时存在问题，同样会导致实例重启或节点驱逐。

解决方法：

 1， Disable osysmond from issuing pstack:
 As root user, issue
crsctl stop res ora.crf -init
Update PSTACK=DISABLE in $GRID_HOME/crf/admin/crf.ora
crsctl start res ora.crf -init

2.  disable diagsnap.
As GI user, issue
$GI_HOME/bin/oclumon manage -disable diagsnap

第四个问题
ora.crf 启动失败’Secondary index corrupt: not consistent with primary’ in log//crflogd/crflogd.log.

解决方法
手动重建 BDB databases ，方法和问题一处理一样，不如需要注意一个单词叫“Secondary index” 第二个索引或叫次要索引，索引什么时候有了顺序？这点有点意思，因为在Oracle 19c中这个Secondary index还有更大的用途。下一篇我们说说19c 中的Secondary index.

分析CHM 的shell（来自互联网）

#!/bin/bash
#  Description:  
#         Convert CHM files to more human readable format like vmstat, .... 
#           - move the MEM Low and CPU high message to the end of the line 
#           - diplay data in a tabular format 
#
#  Usage : ./print_sys.sh  grac41_CHMOS 
#          grac41_CHMOS = oclumon output from :  tfactl diagcollect 
#
#  Run  a report for System Metrics   from  16.01.00 -  16.01.59
#       %  ~/print_sys.sh  grac41_CHMOS | egrep '#pcpus|cpuq:|03-22-14 10.00'
#  Output
#       pcpus: 2   #vcpus: 2   cpuht: N   chipname: Intel(R)  swaptotal: 5210108  physmemtotal: 4354292 #sysfdlimit: 6815744 #disks: 27 #nics: 6  
#                           cpu:  cpuq: memfree:  mcache:  swapfree:  ior:  iow:  ios: swpin:  swpout: pgin: pgout: netr: netw: procs: rtprocs:  #fds: nicErrors:  
#       03-22-14 10.00.03   2.60   6     86356    215692   1811240     16     1    11      6       0    17      1    41     7    378       15  19648     0 
#       03-22-14 10.00.13   5.27   1     89492    224720   1785120   8444  8528   166   2764    3414  4437   3497    41    12    381       15  19680     0 
#       03-22-14 10.00.18   5.87   1     96180    227256   1776196   7682  5508   534   2004    2400  3762   2524    47    10    388       15  19712     0 
# ..
#
#     ... 
echo "->   File searched: " $1
# echo "-> Search Str 1   : " $2
	# pcpus indicates a SYSTEM Metric report 
search1="pcpus"
        #
	# remove any ; from each line - simplifies processing 
cat $1  |   sed 's/;/ /g' |  sed 's/'\''//g'  |   awk  'BEGIN  { cnt=0;  }
           /Node:/ { Node=$0; Nodet1=$4;  Nodet2=$5;   }  
           /'$search1'/ {      
          #      printf("%s \n", $1 );
             if ( $1=="#pcpus:" )  
	       { 
	       if ( cnt==0 )   
		 { 
		 cnt++; 
                       # print header:  number of CPUs and Chip Identidy
                 printf ("%s %s   %s %s   %s %s   %s %s  %s %s  %s %s %s %s %s %s %s %s  \n", \
                       $1, $2, $3, $4, $5, $6, $7, $8, $21, $22, $15, $16, $47, $48, $49, $50, $51, $52);
                 printf ("                   cpu:  cpuq: memfree:  mcache:  swapfree:  ior:  iow:  ios: swpin:");
                 printf ("  swpout: pgin: pgout: netr: netw: procs: rtprocs:  #fds: nicErrors:  \n"  );
                 } 
               
               cnt++; 
               for (i = 1; i <= NF; i++)	 
		 { 
	         if ($i=="cpu:" )   
		   { 
		    # printf ("%s ", $(i+1) );
                    memlow = "";
                    cpu=$(i+1);
                    i++;
                   }
	         else if ($i=="cpuq:" )   
		   { 
		    # printf ("%s ", $(i+1) );
                    cpuq=$(i+1);
                    i++;
                   }
	         else if ($i=="physmemfree:" )   
		   { 
                    physmemfree=$(i+1);
                    i++;
                   }
	         else if ($i=="mcache:" )   
		   { 
                    mcache=$(i+1);
                    i++;
                   }
	         else if ($i=="swapfree:" )   
		   { 
                    swapfree=$(i+1);
                    i++;
                   }
	         else if ($i=="ior:" )   
		   { 
                    ior=$(i+1);
                    i++;
                   }
	         else if ($i=="iow:" )   
		   { 
                    iow=$(i+1);
                    i++;
                   }
	         else if ($i=="ios:" )   
		   { 
                    ios=$(i+1);
                    i++;
                   }
	         else if ($i=="swpin:" )   
		   { 
                    swpin=$(i+1);
                    i++;
                   }
	         else if ($i=="swpout:" )   
		   { 
                    swpout=$(i+1);
                    i++;
                   }
	         else if ($i=="pgin:" )   
		   { 
                    pgin=$(i+1);
                    i++;
                   }
	         else if ($i=="pgout:" )   
		   { 
                    pgout=$(i+1);
                    i++;
                   }
	         else if ($i=="netr:" )   
		   { 
                    netr=$(i+1);
                    i++;
                   }
	         else if ($i=="netw:" )   
		   { 
                    netw=$(i+1);
                    i++;
                   }
	         else if ($i=="procs:" )   
		   { 
                    procs=$(i+1);
                    i++;
                   }
	         else if ($i=="rtprocs:" )   
		   { 
                    rtprocs=$(i+1);
                    i++;
                   }
	         else if ($i=="#fds:" )   
		   { 
                    fds=$(i+1);
                    i++;
                   }
	         else if ($i=="nicErrors:" )   
		   { 
                    nicErrors=$(i+1);
                    i++;
                   }
                 else if ($i== "total-mem")
		   {
                   # Record detection for LOW memory  indication 
                   # Available memory (physmemfree 91516 KB + swapfree 185276 KB) on node grac41 is Too Low (< 10% of total-mem + total-swap)
                   # Search for  total-mem and select i-2 field which is 10% is the above case
                   # 
                   memlow =  $(i-2);
                   # printf(" **** MEM low: < %s *** " , $(i-2) ); } } printf ("%s %s %6s %3d %9s %9s %9s %5s %5s %5s %5s %5s %5s %5s %5d %5d %5d %5d %5d %5d ", \ Nodet1, Nodet2, cpu, cpuq, physmemfree, mcache, swapfree, ior, iow, ios, swpin, swpout, pgin,pgout, netr, netw, procs, rtprocs, fds, nicErrors ); if ( cpu > 90 )
		   {
                   # Record detection for HIGH CPU usage indication 
                    printf (" CPU > 90% "); 
                   }
               if ( memlow != "" )
		   {
                    printf(" MEMLOW < %s", memlow);
                   }
 
               printf("\n");
               # printf("%s \n", $1 );
#	       printf("%s %s %6s %3s %10s %10s %10s %5s %5s %5s  %5s   %5s %5s %5s %8s %8s %5s   %5s   %5s %5s \n",  \
#                     Nodet1, Nodet2,  $10, $12, $14, $18, $20, $24,$26,$28, $30, $32, $34 , $36, $38, $40, $42, $44, $46, $54 ); 
	       }  
             } '