Troubleshooting Slower IO Performance on Veritas for 11.2.0.4 compared 10gR2 on RAW device after RMAN migrate

简单的记录一个案例，一个数据库相同的存储从原来的oracle 10.2.0.4 使用RAW device，升级到oracle 11.2.0.4 同时使用Veritas 卷Vxfs 文件系统的RAC ，因为也使用了ODM（Oracle Disk Manager）, 理论上也是一种变象的RAW device, 支持异常IO，避免双重buffer. 用户最直观的感受是在RMAN 备份集做的迁移后，数据库性能比之前慢的很多倍，如一个6G的表count(*)原来不到1分钟到后来要10分钟以上，同样对相同大小的数据库的RMAN 备份也有原来的2个小时左右，到现在10几个小时，同时备份是CPU 使用率很高 ,vmstat 显示很高的cpu wait。

SQL ID: 8h17yarx41rhx Plan Hash: 2304749046
select count(*)
from
TAB_ANBOB

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 36.22 824.27 857822 857990 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 36.23 824.28 857822 857990 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 SORT AGGREGATE (cr=857990 pr=857822 pw=0 time=824274020 us)
42799324 42799324 42799324 TABLE ACCESS FULL TAB_ANBOB (cr=857990 pr=857822 pw=0 time=809921707 us cost=233014 size=0 card=42815870)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
library cache lock 1 0.00 0.00
row cache lock 2 0.00 0.00
library cache pin 1 0.00 0.00
SQL*Net message to client 2 0.00 0.00
Disk file operations I/O 8 0.00 0.00
db file sequential read 4 0.01 0.02
db file scattered read 6713 0.30 793.87
gc cr multi block request 8388 0.05 8.79
gc cr grant 2-way 3 0.00 0.00
latch: gc element 5 0.00 0.00
SQL*Net message from client 2 53.55 53.55

Note:
对count做了sql trace基本是I/O的分散读，性能确实是非常不理想。

IOStat by Function summary
Function Name Reads: Data Reqs per sec Data per sec Writes: Data Reqs per sec Data per sec Waits: Count Avg Tm(ms)
RMAN 775.7G 73.63 36.781M 699.4G 33.17 33.163M 1456 2.61
Buffer Cache Reads 1.7G 4.21 .083M 0M 0.00 0M 47.4K 10.20
Others 1.1G 3.35 .052M 277M 0.82 .013M 72.4K 0.07
DBWR 0M 0.00 0M 1.1G 2.99 .051M 0
LGWR 0M 0.00 0M 893M 21.15 .041M 0
Direct Writes 0M 0.00 0M 6M 0.00 0M 0
Direct Reads 1M 0.01 0M 0M 0.00 0M 0
TOTAL: 778.5G 81.20 36.915M 701.6G 58.13 33.269M 121.2K 4.06

Note:
RMAN备份同样也是每秒30M+左右的吞吐，但是和count一样，单次IO读也在3ms左右(SAS)，那说明存储响应时间还是不错。

性能差距这么大常见的几个原因：

1，备份如果是到磁带机可能是有原来的LAN-FREE变成了LAN网络，遇到了网络瓶颈。

2， VxFS 碎片太多。

Block size

cannot be changed once the file system has been created，Recommended Oracle file systems block size (assuming your Oracle database have block size equal or bigger than 8KB which is :

Recommended Oracle file systems block size (assuming your Oracle database have block size equal or bigger than 8KB which is :

File System	Block Size
Oracle software and dump/diagnostic directories	1KB
Redo log directory	512 bytes for Solaris, AIX, Windows, Linux and 1KB for HP-UX
Archived log directory	1KB
Control files directory	8KB (control files block size is 16KB starting with Oracle 10g)
Data, index, undo, system/sysaux and temporary directories	8KB

You can check control file block size with (Linux RedHat 5.5 and Oracle 11.2.0.3):

SQL> select cfbsz from x$kcccf;

     CFBSZ
----------
     16384
     16384

File extents

Same as Oracle table extent you can change default extent allocation policy and/or preallocate space to a file:

# getext system01.dbf

File system fragmentation

To display it issue:

# /opt/VRTS/bin/fsadm -t vxfs -D /oracle/data

# /opt/VRTS/bin/df -o s /ora_prisma/data/

# /opt/VRTS/bin/fsmap -A $datafile

3, 双重buffer

Sun Solaris using a Veritas File system
Oracle Import takes longer when using buffered VxFS than using unbuffered VxFS. Quick I/O allows regular files built on VxFS to
be accessed as a raw device, bypassing normal file system buffering and allowing direct I/O. when Quick I/O is NOT enabled, the convosync & options are used in “mount” command for altering the caching behavior of a vxfs file system; mount options “convosync=direct” and “=direct” can be used to avoid double-buffering. Data is synchronous to disk bypassing the page cache and meta-data updates are synchronous. These options are normally used for datafiles and redo logs. The __fdsync() system calls indicate that the problem is OS-related. Setting mincache=direct on veritas filesystem mount option solves performance problem.

mount -F vxfs -o mincache=direct /dev/vx/dsk/testdg/testvol /test

Mount options

Suggested mount options for Oracle databases:

File System	Normal Mount Options (VxFS)	Advanced Mount Options (VxFS)
Oracle software and dump/diagnostic directories	delaylog,datainlog,nolargefiles	delaylog,nodatainlog,nolargefiles
Redo log directory	delaylog,datainlog,largefiles	delaylog,nodatainlog,convosync=direct,mincache=direct,largefiles
Archived log directory	delaylog,datainlog,nolargefiles	delaylog,nodatainlog,convosync=direct,mincache=direct,nolargefiles
Control files directory	delaylog,datainlog,nolargefiles	delaylog,datainlog,nolargefiles
Data, index, undo, system/sysaux and temporary directories	delaylog,datainlog,largefiles	delaylog,nodatainlog,convosync=direct,mincache=direct,largefiles

4, Veritas library BUG

An upgrade in veritas library from 5.0 to 5.1 (which includes ODM), resolved the LGWR issue.The LGWR appears to be encountering regular problems where it hangs for 3 seconds at a time. During this time, an iostat -x on the redo devices shows zero IO being pushed through.

Intermittent Long ‘log file sync’ Waits on Veritas ODM, LGWR Posting Long Write Times, I/O Portion of Wait Minimal (文档 ID 1278149.1)

根本原因

回到这个案例，因为当前的SQL查询和rman 备份到本地vxfs都很慢，所以排除了第一种可能，后分析发现当前的VERITAS VXFS文件系统存在大量的碎片，使用fsmap 查看一个8G的数据文件，（和oracle table extent一样每一行表示一个碎片）有17万个碎片。当有巨量的碎片，执行每一个文件的操作时，需要从元数据中检索到每个扩展(extent)信息，如此巨大的meta data数据量操作，是一定会消耗大量的CPU资源和费时，这也是为什么rman备份时会有cpu wait.

经检查，这些数据文件之所以有巨量的扩展(extent)，原因是，产生这些Oracle数据文件时，采用了Oracle RMAN并行写入的方式，即多个数据文件在同时不停的追加，并没有事先为这些文件预留连续空间。因此，数据在写入下层介质时，不是连续的一个整体，频繁分配新的extent，导致产生巨量extent，最终引发文件读写性能问题。

解决方法

可以使用veritas的文件系统碎片整理，或数据文件重新在文件系统上重构。

碎片整理方法（注：可在线，参数是文件系统的根，时间比较久）：

nohup /opt/VRTS/bin/fsadm –EeDd /acct/oradata01/ &