首页 » ORACLE 9i-23ai, 系统相关 » Troubleshooting Oracle db crash caused by Linux OOM kill 内存耗尽

Troubleshooting Oracle db crash caused by Linux OOM kill 内存耗尽

最近半年遇到了至少有4例因为oracle内存耗尽出现的OOM kill  oracle进程,DB instance crash的现象, 常见原因是内存分配不合理,如过大的Hugepage或没配置Hugepage, 或过大的SGA,或有备份导出任务占用过多的cached内存。 之前整理过《Troubleshooting Out-Of-Memory(OOM) killer db crash when memory exhausted》, 仅记录一下问题现象

常见分析思路:
查看DB alert log
查看OS log
确认OOM的进程
OSW看vmstat, meminfo, ps, top
查看TOP 进程
查看MEM使用对比
关注hugepage或pagetables
DASH查找PGA使用趋势
进程PGA内存区
pmap查看进程内存
lmhb trace(RAC)
分析core dump

案例1

Oracle Exadata 环境19c环境,内存耗尽,实例crash,查看内存使用进程为lms, 5个lms进程占用了约60GB内存。内存逐渐上升

[weejar.weejar] ➤  head -n 2 ps.txt; cat ps.txt|egrep -v 'zzz|PPID' |sort -nrk 11|head -n 30
  zzz <06/21/2023 12:10:10> subcount: 36
F S RUSER       PID   PPID  C PSR PRI  NI ADDR   RSS    SZ WIDE-WCHAN-COLUMN STIME TT           TIME CMD
0 S oracle    61315      1 13  80  41   -    - 17621076 105778972 poll_schedule_tim Apr07 ? 10-01:39:20 ora_lms3_phnsf3
0 S oracle    61307      1 13  91  41   -    - 13360756 104579991 poll_schedule_tim Apr07 ? 9-22:57:59 ora_lms1_phnsf3
0 S oracle    61319      1 14  15  41   -    - 11640576 104920982 -          Apr07 ?        11-02:16:11 ora_lms4_phnsf3
0 S oracle    61303      1 14  10  41   -    - 10840900 104616644 -          Apr07 ?        10-21:27:22 ora_lms0_phnsf3
0 S oracle    61311      1 14  21  41   -    - 10789784 104596654 poll_schedule_tim Apr07 ? 11-03:38:17 ora_lms2_phnsf3
0 S root      18754      1  0  79  19   0    - 1375980 2841981 futex_wait_queue_ Apr07 ?    12:44:55 /u01/orgrid/oracle/product/194/jdk/jre/bin/java -server -Xms512m -Xmx1024m -Djava.awt.headless=true -Ddisable.checkForUpdate=true -XX:ParallelGCThreads=5 oracle.rat.tfa.TFAMain /u01/orgrid/

# pmap -x [PID]

AnonPages匿名页,与meminfo在重启前后匹配,耗用内存较大区。

Note : 重启后初始化内存仅60M,重启前达17GB

案例2

oracle 11g环境平时数据库主机有本地RMAN备份未的指定filemax size备份集分片,产生较大cached内存, inactive file达100GB.

此类可以拆分备份文件大小,减少文件持有时间,调整OS内核参数,或备份完后手动释放cached。

# sync; echo 3 > /proc/sys/vm/drop_caches

案例3

oracle 11g使用VCS HA双机, 内存耗尽,最近重启后hugepage被人改动,导致pagetables浪费大量内存。主机上有logminer实时解析archivelog导致OS cache使用也较高。linux 6.9 OS日志也出现了OS hang的报错。
内存对比

[root@anbob2 oswmeminfo]# more anbob2_meminfo_23.07.11.0900.dat
zzz ***Tue Jul 11 09:00:08 CST 2023    zzz ***Tue Jul 11 13:08:51 CST 2023
MemTotal:       528940012 kB           MemTotal:       528940012 kB       
MemFree:        57401212 kB            MemFree:          667468 kB        <<<< -50g
Buffers:           79244 kB            Buffers:            5588 kB        
Cached:         349472200 kB           Cached:         221741568 kB       <<<<< -100g
SwapCached:        14168 kB            SwapCached:       192832 kB        
Active:         277106340 kB           Active:         201495284 kB       
Inactive:       92552196 kB            Inactive:       42237424 kB        <<<<  -50g
Active(anon):   197398472 kB           Active(anon):   201425664 kB       
Inactive(anon): 45652700 kB            Inactive(anon): 42184392 kB        
Active(file):   79707868 kB            Active(file):      69620 kB        <<<<  -70g
Inactive(file): 46899496 kB            Inactive(file):    53032 kB        <<<<  -40g
Unevictable:     2380312 kB            Unevictable:     2379472 kB        
Mlocked:          558700 kB            Mlocked:          568080 kB        
SwapTotal:      32767996 kB            SwapTotal:      32767996 kB        
SwapFree:       31869992 kB            SwapFree:       30463804 kB        
Dirty:          16517776 kB            Dirty:                48 kB        <<<<  -16g
Writeback:             0 kB            Writeback:             0 kB        
AnonPages:      22516868 kB            AnonPages:      24230932 kB        
Mapped:         156051936 kB           Mapped:         161001544 kB       
Shmem:          222796464 kB           Shmem:          221477168 kB       
Slab:            3461896 kB            Slab:            1969372 kB        
SReclaimable:    1878396 kB            SReclaimable:     876144 kB        
SUnreclaim:      1583500 kB            SUnreclaim:      1093228 kB        
KernelStack:      163600 kB            KernelStack:      163296 kB        
PageTables:     69364292 kB            PageTables:     252790084 kB       <<<< +200g
NFS_Unstable:          0 kB            NFS_Unstable:          0 kB        
Bounce:                0 kB            Bounce:                0 kB        
WritebackTmp:          0 kB            WritebackTmp:          0 kB        
CommitLimit:    286999024 kB           CommitLimit:    286999024 kB       
Committed_AS:   289703328 kB           Committed_AS:   291139212 kB       
VmallocTotal:   34359738367 kB         VmallocTotal:   34359738367 kB     
VmallocUsed:     1739320 kB            VmallocUsed:     1739320 kB        
VmallocChunk:   33956569304 kB         VmallocChunk:   33956569304 kB     
HardwareCorrupted:     0 kB            HardwareCorrupted:     0 kB        
AnonHugePages:   2150400 kB            AnonHugePages:   2072576 kB        
HugePages_Total:    9999               HugePages_Total:    9999           
HugePages_Free:      307               HugePages_Free:      307           
HugePages_Rsvd:      293               HugePages_Rsvd:      293           
HugePages_Surp:        0               HugePages_Surp:        0           
Hugepagesize:       2048 kB            Hugepagesize:       2048 kB        
DirectMap4k:       65536 kB            DirectMap4k:       65536 kB        
DirectMap2M:     1761280 kB            DirectMap2M:     1761280 kB        
                                       DirectMap1G:    534773760 kB      

内核参数

[root@anbob2 oswmeminfo]# ll /etc/sysctl.conf
-rw-r--r-- 1 root root 1437 Sep  5  2018 /etc/sysctl.conf
 
[root@anbob2 oswmeminfo]# grep -i huge /etc/sysctl.conf 
vm.nr_hugepages = 153600
[root@anbob2 oswmeminfo]# sysctl -a|grep -i huge
vm.nr_hugepages = 9999                                     <<<<<<<<<<<<<<<<<< NODE2 没生效
vm.nr_hugepages_mempolicy = 9999                      
vm.hugetlb_shm_group = 0
vm.hugepages_treat_as_movable = 0
vm.nr_overcommit_hugepages = 0


[root@anbob1 oswmeminfo]# grep -i huge /etc/sysctl.conf 
vm.nr_hugepages = 153600                                    <<<<<<<<<<<<<<<<<< NODE1 
[root@anbob1 oswmeminfo]# sysctl -a|grep -i huge
vm.nr_hugepages = 153600
vm.nr_hugepages_mempolicy = 153600
vm.hugetlb_shm_group = 0
vm.hugepages_treat_as_movable = 0
vm.nr_overcommit_hugepages = 0
[root@anbob1 oswmeminfo]# ps -ef|grep smon
root     37937 17846  0 17:16 pts/1    00:00:00 grep smon
oracle   59711     1  0 Jul11 ?        00:02:11 ora_smon_IMSP

变更时间

$ egrep "^zzz|HugePages_Total" oswmeminfo

ugePages_Total:   122625
zzz ***Sat Jul 8 10:12:30 CST 2023
HugePages_Total:   122625
zzz ***Sat Jul 8 10:12:41 CST 2023
HugePages_Total:   122625
zzz ***Sat Jul 8 10:12:51 CST 2023
HugePages_Total:   122625
zzz ***Sat Jul 8 10:13:01 CST 2023
HugePages_Total:   122625
zzz ***Sat Jul 8 10:13:12 CST 2023
HugePages_Total:   122625
zzz ***Sat Jul 8 10:13:22 CST 2023
HugePages_Total:   122625
zzz ***Sat Jul 8 10:13:32 CST 2023
HugePages_Total:   122625
zzz ***Sat Jul 8 10:13:42 CST 2023   <<<<<<<<<<<<<< hugepage size  变更 ,
HugePages_Total:    9999
zzz ***Sat Jul 8 10:13:52 CST 2023
HugePages_Total:    9999
zzz ***Sat Jul 8 10:14:03 CST 2023
HugePages_Total:    9999
zzz ***Sat Jul 8 10:14:13 CST 2023
HugePages_Total:    9999
zzz ***Sat Jul 8 10:14:23 CST 2023

INSTANCE启动日志

2023-07-08 10:14:46.161000 +08:00
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = 313 GB

Total Shared Global Region in Large Pages = 240 GB (100%)

Large Pages used by this instance: 122625 (240 GB)                               <<<<<<<<<<<<<<<<<<<<<<<<   200
Large Pages unused system wide = 30975 (60 GB)
Large Pages configured system wide = 153600 (300 GB)
Large Page size = 2048 KB
********************************************************************
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 96
Number of processor cores in the system is 48
Number of processor sockets in the system is 4
Picked latch-free SCN scheme 3


2023-07-08 10:28:37.020000 +08:00
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = 313 GB

Total Shared Global Region in Large Pages = 20 GB (8%)

Large Pages used by this instance: 9985 (20 GB)                                   <<<<<<<<<<<<<<<<<<<<<<<<   201
Large Pages unused system wide = 14 (28 MB)
Large Pages configured system wide = 9999 (20 GB)
Large Page size = 2048 KB

RECOMMENDATION:
  Total System Global Area size is 240 GB. For optimal performance,
  prior to the next instance restart:
  1. Increase the number of unused large pages by
 at least 112626 (page size 2048 KB, total size 220 GB) system wide to
  get 100% of the System Global Area allocated with large pages
********************************************************************

Instance shutdown complete
2023-07-08 10:44:25.617000 +08:00
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = 313 GB

Total Shared Global Region in Large Pages = 20 GB (8%)                     <<<<<<<<<<<<<<<<<<<<<<<<   201

Large Pages used by this instance: 9985 (20 GB)
Large Pages unused system wide = 14 (28 MB)
Large Pages configured system wide = 9999 (20 GB)
Large Page size = 2048 KB

RECOMMENDATION:
  Total System Global Area size is 240 GB. For optimal performance,
  prior to the next instance restart:
  1. Increase the number of unused large pages by
 at least 112626 (page size 2048 KB, total size 220 GB) system wide to
  get 100% of the System Global Area allocated with large pages
  


2023-07-11 13:29:38.507000 +08:00
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = 313 GB

Total Shared Global Region in Large Pages = 240 GB (100%)            <<<<<<<<<<<<<<<<<<<<<<<<   200

Large Pages used by this instance: 122625 (240 GB)                            
Large Pages unused system wide = 30975 (60 GB)
Large Pages configured system wide = 153600 (300 GB)
Large Page size = 2048 KB
********************************************************************
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 96
Number of processor cores in the system is 48
Number of processor sockets in the

Note:
OOM节点OS sysctl.conf参数文件大页配置153600,但内存级当前为9999,OS有900多天未重启,db instance最近重启使用了仅20G大页(8%),切到另一节点可以100%使用. 可能是从上实例重启后有人sysctl -w 调过内存级参数,缩小了vm.nr_hugepages, 只最最近DB INSTANCE重启才真正释放,具体调整时间未知.

zzz ***Tue Jul 11 13:03:20 CST 2023
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
12  4 932884 1076188   4748 224322464    0    0  1026    75    0    0  1  0 98  0  0	
11  1 932884 1017352   4744 224328432    0    0  4452 11952 80532 49144 14  5 81  1  0	
10  1 932880 1016912   4740 224333232    0    0  4472 13153 84146 52376 13  3 83  1  0	
zzz ***Tue Jul 11 13:03:31 CST 2023
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
17  1 932880 849220   5612 224316912    0    0  1026    75    0    0  1  0 98  0  0	
14  0 932880 779512   5612 224323408    0    0  4364 12692 86111 55005 14  5 81  1  0	
19  1 932880 786308   5612 224328480    0    0  4148 12529 65594 47201 12  3 84  1  0	
zzz ***Tue Jul 11 13:03:41 CST 2023
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
249  3 932880 762192   3492 224164800    0    0  1026    75    0    0  1  0 98  0  0	
271  2 932880 666988   3744 224165296    0    0   428   144 122169 19786  3 77 19  2  0	
103  1 1128640 673972   5516 223528912 4720 197628 53808 848077 33375721 1517708  0 100  0  0  0	<<<<< sys cpu 100%


                                                                           《 《《《《《《《《《《《《《《 snap gap 
zzz ***Tue Jul 11 13:08:51 CST 2023
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
79  0 2191116 676560   5716 221857248    0    0  1026    75    0    0  1  0 98  0  0	
246  3 2241572 678388   5772 221805008   12 49240   572 49585 126798 6238  1 98  1  0  0	
194  0 2277520 669800   5676 221775456    0 35952    84 45364 147335 5038  0 98  1  0  0	
zzz ***Tue Jul 11 13:09:11 CST 2023
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
69  0 3729636 793628   3980 220292544    0    0  1026    75    0    0  1  0 98  0  0	
67  0 3730048 769428   4068 220292656    0  412   252   777 108784 30835  1 66 33  0  0	
67  0 3730184 770468   4068 220292864  192  164   300   741 107760 32892  1 64 35  0  0	
zzz ***Tue Jul 11 13:09:29 CST 2023
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
255  4 3791552 663928   4728 220244272    0    0  1026    75    0    0  1  0 98  0  0	
263  0 3796552 659472   4728 220239904    0 4996  1180  5012 127310 2275  0 100  0  0  0	
261  0 3799912 661764   4728 220236400    0 3364     0  3364 132814 2372  0 100  0  0  0	
zzz ***Tue Jul 11 13:12:04 CST 2023
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
194  0 4598176 679972   3988 219445824    0    0  1026    75    0    0  1  0 98  0  0	
170  0 4627960 668200   4008 219415520   48 29792   200 29856 118756 10679  1 97  2  0  0	
127  0 4664620 672196   4008 219379904    0 36660   140 38193 129276 24727  2 95  3  0  0	
zzz ***Tue Jul 11 13:12:20 CST 2023
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
99  3 5653004 774452   3168 218391728    0    0  1026    75    0    0  1  0 98  0  0	
100  6 5770224 659328   3096 218280864  688 118168  1512 140937 152119 52093  6 60 33  1  0	
110  0 5884952 682720   3128 218162976  128 113932  2460 115510 142064 25756  2 90  8  1  0	
zzz ***Tue Jul 11 13:12:38 CST 2023
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
58  0 6345164 756684   2168 217698096    0    0  1026    75    0    0  1  0 98  0  0	
51  1 6363432 734976   2164 217677440    0 17248   680 17392 100229 12689  1 53 46  1  0	
50  1 6372108 739220   2164 217669120    0 8676   932  8713 95133 11688  0 51 48  1  0	

# iostat
zzz ***Tue Jul 11 13:03:41 CST 2023
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.11    0.00   99.74    0.04    0.00    0.11

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               5.30   719.39    8.10   19.62   182.40  2954.20   226.33     0.25    8.95    2.82   11.48   0.93   2.57
sdi               0.00     0.00    0.14    0.04     0.91     0.87    19.50     0.00    0.90    1.05    0.42   0.90   0.02
..
dm-0              0.00     0.00    9.30  568.98   165.81  2275.93     8.44    11.08   19.13    2.74   19.40   0.01   0.76
dm-1              0.00     0.00    4.15  171.11    16.59   684.44     8.00     2.85   16.18    2.63   16.51   0.16   2.81
..
VxVM28000         0.00     0.00    0.82    0.20     5.71     3.98    19.07     0.00    1.54    0.67    5.07   1.29   0.13
VxVM28001         0.00     0.00    0.01    0.07     0.03     0.68    17.57     0.00    0.39    0.00    0.43   0.39   0.00
VxVM28002         0.00     0.00    0.01    0.01     0.08     0.08    13.71     0.00    0.29    0.25    0.33   0.29   0.00
VxVM28003         0.00     0.00    0.01    0.02     0.08     0.17    14.40     0.00    0.10    0.00    0.17   0.10   0.00
VxVM28004         0.00     0.00    0.01    0.01     0.08     0.08    13.71     0.00    0.14    0.00    0.33   0.14   0.00

zzz ***Tue Jul 11 13:08:51 CST 2023
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.63    0.00   59.94    4.26    0.00   32.17

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             214.42 35776.92  212.50 14915.38  9988.46 203992.31    28.29    34.26    2.28    0.75    2.31   0.06  88.75
..
dm-0              0.00     0.00  154.81   20.19  8900.00    80.77   102.64     0.15    0.88    0.28    5.52   0.70  12.31
dm-1              0.00     0.00  272.12 50750.00  1088.46 203000.00     8.00   149.39    2.88    2.62    2.88   0.02  91.44   <<<<<<<<
..
sdaj              0.00     0.00    0.00    1.92     0.00    84.62    88.00     0.00    0.50    0.00    0.50   0.50   0.10
VxVM28000         0.00     0.00   17.31   27.88    76.92  3844.23   173.53     0.03    0.64    0.56    0.69   0.43   1.92
VxVM28001         0.00     0.00    0.00   16.35     0.00  3651.92   446.82     0.01    0.88    0.00    0.88   0.59   0.96
VxVM28002         0.00     0.00   12.50    0.96   100.00     7.69    16.00     0.01    0.57    0.62    0.00   0.57   0.77
VxVM28003         0.00     0.00    1.92    0.00    15.38     0.00    16.00     0.01    7.00    7.00    0.00   7.00   1.35
VxVM28004         0.00     0.00    1.92    2.88    15.38   146.15    67.20     0.00    0.60    0.50    0.67   0.40   0.19

[root@anbob2 oswmeminfo]# lsblk
NAME                     MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sdb                        8:16   0  500G  0 disk 
├─sdb3                     8:19   0  500G  0 part 
└─sdb8                     8:24   0  500G  0 part 
sdc                        8:32   0  500G  0 disk 
├─sdc3                     8:35   0  500G  0 part 
└─sdc8                     8:40   0  500G  0 part 
sdd                        8:48   0  500G  0 disk 
├─sdd3                     8:51   0  500G  0 part 
└─sdd8                     8:56   0  500G  0 part 
sde                        8:64   0  500G  0 disk 
├─sde3                     8:67   0  500G  0 part 
└─sde8                     8:72   0  500G  0 part 
sda                        8:0    0  3.7T  0 disk 
├─sda1                     8:1    0  524M  0 part /boot/efi
├─sda2                     8:2    0  500M  0 part /boot
└─sda3                     8:3    0  3.7T  0 part 
  ├─rootvg-rootlv (dm-0) 253:0    0    2T  0 lvm  /
  └─rootvg-swaplv (dm-1) 253:1    0 31.3G  0 lvm  [SWAP]
sdf                        8:80   0  500G  0 disk 


# mpstat 
zzz ***Tue Jul 11 13:03:41 CST 2023
Linux 2.6.32-696.el6.x86_64 (anbob2) 	07/11/23 	_x86_64_	(96 CPU)

13:03:41     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
13:03:42     all    2.86    0.00   76.85    1.68    0.00    0.05    0.00    0.00   18.56
13:03:42       0    0.00    0.00  100.00    0.00    0.00    0.00    0.00    0.00    0.00
13:03:42       1    0.00    0.00  100.00    0.00    0.00    0.00    0.00    0.00    0.00
13:03:42       2    3.03    0.00   22.22    5.05    0.00    0.00    0.00    0.00   69.70
13:03:42       3   23.00    0.00   49.00   25.00    0.00    0.00    0.00    0.00    3.00
13:03:42       4    0.00    0.00  100.00    0.00    0.00    0.00    0.00    0.00    0.00
13:03:42       5    0.00    0.00  100.00    0.00    0.00    0.00    0.00    0.00    0.00
13:03:42       6    1.00    0.00   17.00    0.00    0.00    0.00    0.00    0.00   82.00
13:03:42       7    0.99    0.00   93.07    0.00    0.00    0.00    0.00    0.00    5.94
13:03:42       8    0.00    0.00  100.00    0.00    0.00    0.00    0.00    0.00    0.00
13:03:42       9   44.00    0.00    2.00    0.00    0.00    0.00    0.00    0.00   54.00

Note:
期间有OSW出现几分钟的断档,sys CPU 100%, OS或文件系统hang, 缓过来后有swap刷盘,前后出现在几个个进程的R积压。

OSW断档前后对比

zzz ***Tue Jul 11 13:03:41 CST 2023zzz ***Tue Jul 11 13:08:26 CST 2023
MemTotal:       528940012 kB       MemTotal:       528940012 kB       
MemFree:          768460 kB        MemFree:          664400 kB        
Buffers:            3492 kB        Buffers:            4512 kB        
Cached:         224165076 kB       Cached:         223746948 kB       
SwapCached:        14440 kB        SwapCached:        42048 kB        
Active:         200373904 kB       Active:         201907664 kB       
Inactive:       45618352 kB        Inactive:       43839988 kB        
Active(anon):   199794668 kB       Active(anon):   201438972 kB       
Inactive(anon): 45041400 kB        Inactive(anon): 43372572 kB        
Active(file):     579236 kB        Active(file):     468692 kB        
Inactive(file):   576952 kB        Inactive(file):   467416 kB        
Unevictable:     2380312 kB        Unevictable:     2380312 kB        
Mlocked:          558700 kB        Mlocked:          558700 kB        
SwapTotal:      32767996 kB        SwapTotal:      32767996 kB        
SwapFree:       31835116 kB        SwapFree:       31647260 kB        
Dirty:            590572 kB        Dirty:            270588 kB        <<<<<<<<<<<<<<<<<<<<<<<<<<<
Writeback:             4 kB        Writeback:          4612 kB        
AnonPages:      24248696 kB        AnonPages:      24398188 kB        
Mapped:         161099688 kB       Mapped:         161100900 kB       
Shmem:          222851540 kB       Shmem:          222664684 kB       
Slab:            2050540 kB        Slab:            2036328 kB        
SReclaimable:     933644 kB        SReclaimable:     917188 kB        
SUnreclaim:      1116896 kB        SUnreclaim:      1119140 kB        
KernelStack:      163072 kB        KernelStack:      165104 kB        
PageTables:     250403824 kB       PageTables:     250721084 kB       
NFS_Unstable:          0 kB        NFS_Unstable:          0 kB        
Bounce:                0 kB        Bounce:                0 kB        
WritebackTmp:          0 kB        WritebackTmp:          0 kB        
CommitLimit:    286999024 kB       CommitLimit:    286999024 kB       
Committed_AS:   291078736 kB       Committed_AS:   291359788 kB       
VmallocTotal:   34359738367 kB     VmallocTotal:   34359738367 kB     
VmallocUsed:     1739320 kB        VmallocUsed:     1739320 kB        
VmallocChunk:   33956569304 kB     VmallocChunk:   33956569304 kB     
HardwareCorrupted:     0 kB        HardwareCorrupted:     0 kB        
AnonHugePages:   2131968 kB        AnonHugePages:   2131968 kB        
HugePages_Total:    9999           HugePages_Total:    9999           
HugePages_Free:      307           HugePages_Free:      307           
HugePages_Rsvd:      293           HugePages_Rsvd:      293           
HugePages_Surp:        0           HugePages_Surp:        0           
Hugepagesize:       2048 kB        Hugepagesize:       2048 kB        
DirectMap4k:       65536 kB        DirectMap4k:       65536 kB        
DirectMap2M:     1761280 kB        DirectMap2M:     1761280 kB        
DirectMap1G:    534773760 kB       DirectMap1G:    534773760 kB  


$ egrep "vm.swapp|vm.dirty_back|dirty_ratio|vm.dirty_expire|vm.dirty_write|vm.min_free|vm.vfs" sysctl_-a
vm.swappiness = 20
vm.dirty_background_ratio = 10
vm.dirty_background_bytes = 0
vm.dirty_ratio = 20
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000
vm.min_free_kbytes = 135168

建议设置增加页面缓出倾向

#are problem is things not swapping so let’s come up from 0
vm.swappiness=10

#Maximum percentage of active memory that can have dirty pages the maximum percentage of ((Cache + Free) – Mapped) #memory that can be dirty before it is written to disk by the pdflush process
vm.dirty_background_ratio=3

#Maximum percentage of total memory that can have dirty pages the ratio that represents the percentage of MemTotal that #can consume dirty pages before all processes must write dirty buffers back to disk and when this value is reached all #I/O is blocked for any new writes until dirty pages have been flushed
vm.dirty_ratio=15

#How long data can be in page cache before being expired (hundreths of second)
vm.dirty_expire_centisecs=500

#How often pdflush is activated to clean dirty pages (hundreths of a second)
vm.dirty_writeback_centisecs=100

OS LOG

Jul 11 12:15:25 szimsdb2 kernel: INFO: task ps:54176 blocked for more than 120 seconds.
Jul 11 12:15:25 szimsdb2 kernel:      Tainted: P           -- ------------    2.6.32-696.el6.x86_64 #1
Jul 11 12:15:25 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 11 12:15:25 szimsdb2 kernel: ps            D 000000000000003b     0 54176  54159 0x00000080
Jul 11 12:15:25 szimsdb2 kernel: ffff886d0ae63c68 0000000000000086 0000000000000000 ffffffff8123b0b6
Jul 11 12:15:25 szimsdb2 kernel: ffff886d0ae63be8 ffff886d0ae63e08 0131d140e4e0f731 000000000ae63cd8
Jul 11 12:15:25 szimsdb2 kernel: ffffffff8100bc0e 000000150d00bb8b ffff886556cf65f8 ffff886d0ae63fd8
Jul 11 12:15:25 szimsdb2 kernel: Call Trace:
Jul 11 12:15:25 szimsdb2 kernel: [] ? security_task_to_inode+0x16/0x20
Jul 11 12:15:25 szimsdb2 kernel: [] ? apic_timer_interrupt+0xe/0x20
Jul 11 12:15:25 szimsdb2 kernel: [] ? mutex_spin_on_owner+0x9b/0xc0
Jul 11 12:15:25 szimsdb2 kernel: [] __mutex_lock_slowpath+0x96/0x210
Jul 11 12:15:25 szimsdb2 kernel: [] mutex_lock+0x2b/0x50
Jul 11 12:15:25 szimsdb2 kernel: [] pipe_write+0x7e/0x6b0
Jul 11 12:15:25 szimsdb2 kernel: [] do_sync_write+0xfa/0x140
Jul 11 12:15:25 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40
Jul 11 12:15:25 szimsdb2 kernel: [] ? mntput_no_expire+0x30/0x110
Jul 11 12:15:25 szimsdb2 kernel: [] ? security_file_permission+0x16/0x20
Jul 11 12:15:25 szimsdb2 kernel: [] vfs_write+0xb8/0x1a0
Jul 11 12:15:25 szimsdb2 kernel: [] ? fget_light_pos+0x16/0x50
Jul 11 12:15:25 szimsdb2 kernel: [] sys_write+0x51/0xb0
Jul 11 12:15:25 szimsdb2 kernel: [] ? __audit_syscall_exit+0x25e/0x290
Jul 11 12:15:25 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b
Jul 11 13:07:28 szimsdb2 kernel: INFO: task jbd2/dm-0-8:4344 blocked for more than 120 seconds.
Jul 11 13:07:35 szimsdb2 kernel:      Tainted: P           -- ------------    2.6.32-696.el6.x86_64 #1
Jul 11 13:07:38 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 11 13:07:38 szimsdb2 kernel: jbd2/dm-0-8   D 0000000000000009     0  4344      2 0x00000000
Jul 11 13:07:38 szimsdb2 kernel: ffff88404e61fd20 0000000000000046 ffff88404e61fce8 ffff88404e61fce4
Jul 11 13:07:38 szimsdb2 kernel: ffff88404eec8000 ffff88207fe84a00 0131d4006805a0db ffff8820f0dd6ec0
Jul 11 13:07:38 szimsdb2 kernel: 00000000000003e7 000000150d2edc86 ffff88404f9b3068 ffff88404e61ffd8
Jul 11 13:07:38 szimsdb2 kernel: Call Trace:
Jul 11 13:07:38 szimsdb2 kernel: [] jbd2_journal_commit_transaction+0x19f/0x14f0 [jbd2]
Jul 11 13:07:38 szimsdb2 kernel: [] ? lock_timer_base+0x3c/0x70
Jul 11 13:07:38 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40
Jul 11 13:07:38 szimsdb2 kernel: [] kjournald2+0xb8/0x220 [jbd2]
Jul 11 13:07:38 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40
Jul 11 13:07:38 szimsdb2 kernel: [] ? kjournald2+0x0/0x220 [jbd2]
Jul 11 13:07:38 szimsdb2 kernel: [] kthread+0x9e/0xc0
Jul 11 13:07:38 szimsdb2 kernel: [] child_rip+0xa/0x20
Jul 11 13:07:38 szimsdb2 kernel: [] ? kthread+0x0/0xc0
Jul 11 13:07:38 szimsdb2 kernel: [] ? child_rip+0x0/0x20
Jul 11 13:07:38 szimsdb2 kernel: INFO: task vxdclid:35809 blocked for more than 120 seconds.
Jul 11 13:07:38 szimsdb2 kernel:      Tainted: P           -- ------------    2.6.32-696.el6.x86_64 #1
Jul 11 13:07:38 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 11 13:07:38 szimsdb2 kernel: vxdclid       D 0000000000000004     0 35809      1 0x00000080
Jul 11 13:07:38 szimsdb2 kernel: ffff88210340bbe8 0000000000000086 ffff88210340bb88 ffffffff8117fa68
Jul 11 13:07:38 szimsdb2 kernel: 0000000000000000 ffff8821350d0000 0004125000000000 ffff884050364160
Jul 11 13:07:38 szimsdb2 kernel: ffff88204e71aa80 0000000000000002 ffff8821242785f8 ffff88210340bfd8
Jul 11 13:07:38 szimsdb2 kernel: Call Trace:
Jul 11 13:07:38 szimsdb2 kernel: [] ? ____cache_alloc_node+0x108/0x160
Jul 11 13:07:38 szimsdb2 kernel: [] ? prepare_to_wait+0x4e/0x80
Jul 11 13:07:38 szimsdb2 kernel: [] start_this_handle+0x25a/0x480 [jbd2]
Jul 11 13:07:38 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40
Jul 11 13:07:38 szimsdb2 kernel: [] jbd2_journal_start+0xb5/0x100 [jbd2]
Jul 11 13:07:38 szimsdb2 kernel: [] ext4_journal_start_sb+0x56/0xe0 [ext4]
Jul 11 13:07:38 szimsdb2 kernel: [] ext4_dirty_inode+0x2a/0x60 [ext4]
Jul 11 13:07:38 szimsdb2 kernel: [] __mark_inode_dirty+0x3b/0x1c0
Jul 11 13:07:38 szimsdb2 kernel: [] touch_atime+0x195/0x1a0
Jul 11 13:07:38 szimsdb2 kernel: [] ext4_file_mmap+0x5d/0x60 [ext4]
Jul 11 13:07:38 szimsdb2 kernel: [] mmap_region+0x400/0x5b0
Jul 11 13:07:38 szimsdb2 kernel: [] do_mmap_pgoff+0x335/0x380
Jul 11 13:07:38 szimsdb2 kernel: [] sys_mmap_pgoff+0x17a/0x340
Jul 11 13:07:38 szimsdb2 kernel: [] sys_mmap+0x29/0x30
Jul 11 13:07:38 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b
Jul 11 13:07:38 szimsdb2 kernel: INFO: task MountAgent:17301 blocked for more than 120 seconds.
Jul 11 13:07:38 szimsdb2 kernel:      Tainted: P           -- ------------    2.6.32-696.el6.x86_64 #1
Jul 11 13:07:38 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 11 13:07:38 szimsdb2 kernel: MountAgent    D 0000000000000023     0 17301      1 0x00000080
Jul 11 13:07:38 szimsdb2 kernel: ffff886d0ad6fdf8 0000000000000082 0000000000000101 ffff8840fbf578d0
Jul 11 13:07:38 szimsdb2 kernel: ffff886050ac02c0 ffffffff8123bd7f ffff886d0ad6fd78 ffff887c599bd080
Jul 11 13:07:38 szimsdb2 kernel: 0000000000000000 0000000000000024 ffff886d351025f8 ffff886d0ad6ffd8
Jul 11 13:07:40 szimsdb2 kernel: Call Trace:
Jul 11 13:07:49 szimsdb2 kernel: [] ? security_inode_permission+0x1f/0x30
Jul 11 13:07:50 szimsdb2 kernel: [] ? do_filp_open+0x6ea/0xd20
Jul 11 13:07:50 szimsdb2 kernel: [] rwsem_down_failed_common+0x95/0x1d0
Jul 11 13:07:50 szimsdb2 kernel: [] rwsem_down_write_failed+0x23/0x30
Jul 11 13:07:50 szimsdb2 kernel: [] call_rwsem_down_write_failed+0x13/0x20
Jul 11 13:07:50 szimsdb2 kernel: [] ? down_write+0x32/0x40
Jul 11 13:07:50 szimsdb2 kernel: [] sys_mmap_pgoff+0x5b/0x340
Jul 11 13:07:51 szimsdb2 kernel: [] sys_mmap+0x29/0x30
Jul 11 13:07:54 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b
Jul 11 13:07:56 szimsdb2 kernel: INFO: task MountAgent:17303 blocked for more than 120 seconds.
Jul 11 13:08:00 szimsdb2 kernel:      Tainted: P           -- ------------    2.6.32-696.el6.x86_64 #1
Jul 11 13:08:01 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 11 13:08:01 szimsdb2 kernel: MountAgent    D 000000000000003f     0 17303      1 0x00000080
Jul 11 13:08:01 szimsdb2 kernel: ffff886d0ad73df8 0000000000000082 0000000000000101 ffff8840fbf578d0
Jul 11 13:08:02 szimsdb2 kernel: ffff886050ac02c0 ffffffff8123bd7f ffff886d0ad73d78 ffff887a64311b80
Jul 11 13:08:02 szimsdb2 kernel: 0000000000000000 0000000000000024 ffff886d11cf7ad8 ffff886d0ad73fd8
Jul 11 13:08:02 szimsdb2 kernel: Call Trace:
Jul 11 13:08:02 szimsdb2 kernel: [] ? security_inode_permission+0x1f/0x30
Jul 11 13:08:02 szimsdb2 kernel: [] ? do_filp_open+0x6ea/0xd20
Jul 11 13:08:02 szimsdb2 kernel: [] rwsem_down_failed_common+0x95/0x1d0
Jul 11 13:08:02 szimsdb2 kernel: [] rwsem_down_write_failed+0x23/0x30
Jul 11 13:08:02 szimsdb2 kernel: [] call_rwsem_down_write_failed+0x13/0x20
Jul 11 13:08:02 szimsdb2 kernel: [] ? down_write+0x32/0x40
Jul 11 13:08:02 szimsdb2 kernel: [] sys_mmap_pgoff+0x5b/0x340
Jul 11 13:08:02 szimsdb2 kernel: [] sys_mmap+0x29/0x30
Jul 11 13:08:02 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b
Jul 11 13:08:02 szimsdb2 kernel: INFO: task MountAgent:17304 blocked for more than 120 seconds.
Jul 11 13:08:02 szimsdb2 kernel:      Tainted: P           -- ------------    2.6.32-696.el6.x86_64 #1
Jul 11 13:08:02 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 11 13:08:02 szimsdb2 kernel: MountAgent    D 000000000000005b     0 17304      1 0x00000080
Jul 11 13:08:02 szimsdb2 kernel: ffff886d0ad77df8 0000000000000082 0000000000000101 ffff8840fbf578d0
Jul 11 13:08:02 szimsdb2 kernel: ffff886050ac02c0 ffffffff8123bd7f ffff886d0ad77d78 ffff886207b10bc0
Jul 11 13:08:02 szimsdb2 kernel: 0000000000000000 0000000000000024 ffff886d11cf7068 ffff886d0ad77fd8
Jul 11 13:08:02 szimsdb2 kernel: Call Trace:
Jul 11 13:08:02 szimsdb2 kernel: [] ? security_inode_permission+0x1f/0x30
Jul 11 13:08:02 szimsdb2 kernel: [] ? do_filp_open+0x6ea/0xd20
Jul 11 13:08:02 szimsdb2 kernel: [] rwsem_down_failed_common+0x95/0x1d0
Jul 11 13:08:02 szimsdb2 kernel: [] rwsem_down_write_failed+0x23/0x30
Jul 11 13:08:02 szimsdb2 kernel: [] call_rwsem_down_write_failed+0x13/0x20
Jul 11 13:08:02 szimsdb2 kernel: [] ? down_write+0x32/0x40
Jul 11 13:08:02 szimsdb2 kernel: [] sys_mmap_pgoff+0x5b/0x340
Jul 11 13:08:02 szimsdb2 kernel: [] sys_mmap+0x29/0x30
Jul 11 13:08:02 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b
Jul 11 13:08:02 szimsdb2 kernel: INFO: task HostMonitor:17262 blocked for more than 120 seconds.
Jul 11 13:08:02 szimsdb2 kernel:      Tainted: P           -- ------------    2.6.32-696.el6.x86_64 #1
Jul 11 13:08:02 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 11 13:08:02 szimsdb2 kernel: HostMonitor   D 000000000000001b     0 17262      1 0x00000080
Jul 11 13:08:02 szimsdb2 kernel: ffff880d51fdbb18 0000000000000082 ffff880d51fdbab8 ffffffff8117fa68
Jul 11 13:08:02 szimsdb2 kernel: 0000000000000000 ffff8800693f7000 0004125000000000 ffff88204e71aa60
Jul 11 13:08:02 szimsdb2 kernel: ffff88804f230880 0000000000000002 ffff88204a13a5f8 ffff880d51fdbfd8
Jul 11 13:08:02 szimsdb2 kernel: Call Trace:
Jul 11 13:08:02 szimsdb2 kernel: [] ? ____cache_alloc_node+0x108/0x160
Jul 11 13:08:02 szimsdb2 kernel: [] ? prepare_to_wait+0x4e/0x80
Jul 11 13:08:02 szimsdb2 kernel: [] start_this_handle+0x25a/0x480 [jbd2]
Jul 11 13:08:02 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40
Jul 11 13:08:02 szimsdb2 kernel: [] jbd2_journal_start+0xb5/0x100 [jbd2]
Jul 11 13:08:02 szimsdb2 kernel: [] ext4_journal_start_sb+0x56/0xe0 [ext4]
Jul 11 13:08:02 szimsdb2 kernel: [] ext4_dirty_inode+0x2a/0x60 [ext4]
Jul 11 13:08:02 szimsdb2 kernel: [] __mark_inode_dirty+0x3b/0x1c0
Jul 11 13:08:02 szimsdb2 kernel: [] touch_atime+0x195/0x1a0
Jul 11 13:08:02 szimsdb2 kernel: [] generic_file_aio_read+0x380/0x700
Jul 11 13:08:02 szimsdb2 kernel: [] do_sync_read+0xfa/0x140
Jul 11 13:08:02 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40
Jul 11 13:08:02 szimsdb2 kernel: [] ? invalidate_interrupt1+0xe/0x20
Jul 11 13:08:02 szimsdb2 kernel: [] ? security_file_permission+0x16/0x20
Jul 11 13:08:02 szimsdb2 kernel: [] vfs_read+0xb5/0x1a0
Jul 11 13:08:02 szimsdb2 kernel: [] ? fget_light_pos+0x3f/0x50
Jul 11 13:08:02 szimsdb2 kernel: [] sys_read+0x51/0xb0
Jul 11 13:08:02 szimsdb2 kernel: [] ? __audit_syscall_exit+0x25e/0x290
Jul 11 13:08:02 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b
Jul 11 13:08:02 szimsdb2 kernel: INFO: task HostMonitor:17278 blocked for more than 120 seconds.
Jul 11 13:08:02 szimsdb2 kernel:      Tainted: P           -- ------------    2.6.32-696.el6.x86_64 #1
Jul 11 13:08:02 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 11 13:08:02 szimsdb2 kernel: HostMonitor   D 0000000000000031     0 17278      1 0x00000080
Jul 11 13:08:02 szimsdb2 kernel: ffff886d0ad2f998 0000000000000082 0000000000000282 0000000000000030
Jul 11 13:08:02 szimsdb2 kernel: ffff8820f0c16f28 ffff886080010e40 ffff886d0ad2f938 0000000000000002
Jul 11 13:08:02 szimsdb2 kernel: ffff882080021ba8 0000000000000003 ffff886d353c5068 ffff886d0ad2ffd8
Jul 11 13:08:02 szimsdb2 kernel: Call Trace:
Jul 11 13:08:02 szimsdb2 kernel: [] ? prepare_to_wait+0x4e/0x80
Jul 11 13:08:02 szimsdb2 kernel: [] start_this_handle+0x25a/0x480 [jbd2]
Jul 11 13:08:02 szimsdb2 kernel: [] ? cache_alloc_refill+0x15b/0x240
Jul 11 13:08:02 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40
Jul 11 13:08:02 szimsdb2 kernel: [] jbd2_journal_start+0xb5/0x100 [jbd2]
Jul 11 13:08:02 szimsdb2 kernel: [] ext4_journal_start_sb+0x56/0xe0 [ext4]
Jul 11 13:08:02 szimsdb2 kernel: [] ext4_dirty_inode+0x2a/0x60 [ext4]
Jul 11 13:08:02 szimsdb2 kernel: [] __mark_inode_dirty+0x3b/0x1c0
Jul 11 13:08:02 szimsdb2 kernel: [] file_update_time+0xf2/0x170
Jul 11 13:08:02 szimsdb2 kernel: [] ? __sb_start_write+0x80/0x120
Jul 11 13:08:02 szimsdb2 kernel: [] ? wake_bit_function+0x0/0x50
Jul 11 13:08:02 szimsdb2 kernel: [] ? ext4_da_get_block_prep+0x0/0x380 [ext4]
Jul 11 13:08:02 szimsdb2 kernel: [] __block_page_mkwrite+0x3b/0x140
Jul 11 13:08:02 szimsdb2 kernel: [] ext4_page_mkwrite+0x121/0x360 [ext4]
Jul 11 13:08:02 szimsdb2 kernel: [] ? cpumask_next_and+0x29/0x50
Jul 11 13:08:02 szimsdb2 kernel: [] do_wp_page+0x640/0x920
Jul 11 13:08:02 szimsdb2 kernel: [] handle_pte_fault+0x2cd/0xb20
Jul 11 13:08:02 szimsdb2 kernel: [] ? try_to_wake_up+0x24e/0x3e0
Jul 11 13:08:02 szimsdb2 kernel: [] handle_mm_fault+0x2aa/0x3f0
Jul 11 13:08:02 szimsdb2 kernel: [] __do_page_fault+0x141/0x500
Jul 11 13:08:02 szimsdb2 kernel: [] do_page_fault+0x3e/0xa0
Jul 11 13:08:02 szimsdb2 kernel: [] page_fault+0x25/0x30
Jul 11 13:08:02 szimsdb2 kernel: INFO: task hekad-daemon:8274 blocked for more than 120 seconds.
Jul 11 13:08:02 szimsdb2 kernel:      Tainted: P           -- ------------    2.6.32-696.el6.x86_64 #1
Jul 11 13:08:02 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 11 13:08:02 szimsdb2 kernel: hekad-daemon  D 000000000000000d     0  8274      1 0x00000080
Jul 11 13:08:02 szimsdb2 kernel: ffff886d0ae17a88 0000000000000086 ffff88804f9c5660 ffff8880526c7900
Jul 11 13:08:02 szimsdb2 kernel: ffff886d0ae179e8 ffffffff81014b39 ffff886d0ae17a38 ffffffff810b295f
Jul 11 13:08:02 szimsdb2 kernel: ffff886d0ae17a08 0000000000000000 ffff88608022b068 ffff886d0ae17fd8
Jul 11 13:08:02 szimsdb2 kernel: Call Trace:
Jul 11 13:08:02 szimsdb2 kernel: [] ? read_tsc+0x9/0x20
Jul 11 13:08:02 szimsdb2 kernel: [] ? ktime_get_ts+0xbf/0x100
Jul 11 13:08:02 szimsdb2 kernel: [] ? prepare_to_wait+0x4e/0x80
Jul 11 13:08:02 szimsdb2 kernel: [] start_this_handle+0x25a/0x480 [jbd2]
Jul 11 13:08:02 szimsdb2 kernel: [] ? cache_alloc_refill+0x15b/0x240
Jul 11 13:08:02 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40
Jul 11 13:08:02 szimsdb2 kernel: [] jbd2_journal_start+0xb5/0x100 [jbd2]
Jul 11 13:08:02 szimsdb2 kernel: [] ext4_journal_start_sb+0x56/0xe0 [ext4]
Jul 11 13:08:02 szimsdb2 kernel: [] ext4_dirty_inode+0x2a/0x60 [ext4]
Jul 11 13:08:02 szimsdb2 kernel: [] __mark_inode_dirty+0x3b/0x1c0
Jul 11 13:08:02 szimsdb2 kernel: [] file_update_time+0xf2/0x170
Jul 11 13:08:02 szimsdb2 kernel: [] __generic_file_aio_write+0x230/0x490
Jul 11 13:08:02 szimsdb2 kernel: [] generic_file_aio_write+0x88/0x100
Jul 11 13:08:02 szimsdb2 kernel: [] ext4_file_write+0x58/0x190 [ext4]
Jul 11 13:08:02 szimsdb2 kernel: [] ? handle_mm_fault+0x2aa/0x3f0
Jul 11 13:08:02 szimsdb2 kernel: [] do_sync_write+0xfa/0x140
Jul 11 13:08:02 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40
Jul 11 13:08:02 szimsdb2 kernel: [] ? call_rcu+0xe/0x10
Jul 11 13:08:02 szimsdb2 kernel: [] ? d_free+0x3f/0x60
Jul 11 13:08:02 szimsdb2 kernel: [] ? apic_timer_interrupt+0xe/0x20
Jul 11 13:08:02 szimsdb2 kernel: [] ? do_sync_write+0x0/0x140
Jul 11 13:08:02 szimsdb2 kernel: [] vfs_write+0xb8/0x1a0
Jul 11 13:08:02 szimsdb2 kernel: [] ? fget_light_pos+0x3f/0x50
Jul 11 13:08:02 szimsdb2 kernel: [] sys_write+0x51/0xb0
Jul 11 13:08:02 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b
Jul 11 13:08:02 szimsdb2 kernel: INFO: task rs:main Q:Reg:52457 blocked for more than 120 seconds.
Jul 11 13:08:02 szimsdb2 kernel:      Tainted: P           -- ------------    2.6.32-696.el6.x86_64 #1
Jul 11 13:08:02 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 11 13:08:02 szimsdb2 kernel: rs:main Q:Reg D 0000000000000005     0 52457      1 0x00000080
Jul 11 13:08:02 szimsdb2 kernel: ffff88404f8ffa88 0000000000000086 ffff8821350d0498 0000000000000000
Jul 11 13:08:02 szimsdb2 kernel: ffff88404f8ffa08 ffff88405234ec00 ffff88404f8ffa58 ffffffffa019ee13
Jul 11 13:08:02 szimsdb2 kernel: ffff88404f8ffa38 0000000300000001 ffff8824bd7d3ad8 ffff88404f8fffd8
Jul 11 13:08:02 szimsdb2 kernel: Call Trace:
Jul 11 13:08:02 szimsdb2 kernel: [] ? ext4_mark_inode_dirty+0x83/0x1d0 [ext4]
Jul 11 13:08:02 szimsdb2 kernel: [] ? prepare_to_wait+0x4e/0x80
Jul 11 13:08:02 szimsdb2 kernel: [] start_this_handle+0x25a/0x480 [jbd2]
Jul 11 13:08:02 szimsdb2 kernel: [] ? cache_alloc_refill+0x15b/0x240
Jul 11 13:08:02 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40
Jul 11 13:08:02 szimsdb2 kernel: [] jbd2_journal_start+0xb5/0x100 [jbd2]
Jul 11 13:08:02 szimsdb2 kernel: [] ext4_journal_start_sb+0x56/0xe0 [ext4]
Jul 11 13:08:02 szimsdb2 kernel: [] ext4_dirty_inode+0x2a/0x60 [ext4]
Jul 11 13:08:02 szimsdb2 kernel: [] __mark_inode_dirty+0x3b/0x1c0
Jul 11 13:08:02 szimsdb2 kernel: [] file_update_time+0xf2/0x170
Jul 11 13:08:02 szimsdb2 kernel: [] __generic_file_aio_write+0x230/0x490
Jul 11 13:08:02 szimsdb2 kernel: [] generic_file_aio_write+0x88/0x100
Jul 11 13:08:02 szimsdb2 kernel: [] ext4_file_write+0x58/0x190 [ext4]
Jul 11 13:08:02 szimsdb2 kernel: [] ? handle_mm_fault+0x2aa/0x3f0
Jul 11 13:08:02 szimsdb2 kernel: [] do_sync_write+0xfa/0x140
Jul 11 13:08:02 szimsdb2 kernel: [] ? perf_event_task_sched_out+0x2e/0x70
Jul 11 13:08:02 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40
Jul 11 13:08:02 szimsdb2 kernel: [] ? apic_timer_interrupt+0xe/0x20
Jul 11 13:08:02 szimsdb2 kernel: [] ? security_file_permission+0x16/0x20
Jul 11 13:08:02 szimsdb2 kernel: [] vfs_write+0xb8/0x1a0
Jul 11 13:08:02 szimsdb2 kernel: [] ? fget_light_pos+0x3f/0x50
Jul 11 13:08:02 szimsdb2 kernel: [] sys_write+0x51/0xb0
Jul 11 13:08:02 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b
Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 22 sec
Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 23 sec
Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 24 sec
Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 25 sec
Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 26 sec
Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 27 sec
Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 28 sec
Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 29 sec
Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20058 Port h[GAB_USER_CLIENT (refcount 0)] process 16921: heartbeat failed, killing process
Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20059 Port h[GAB_USER_CLIENT (refcount 0)] heartbeat interval 30000 msec. Statistics:
Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 0 ~ 6000 msec: 590166154
Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 6000 ~ 12000 msec: 0
Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 12000 ~ 18000 msec: 0
Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 18000 ~ 24000 msec: 0
Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 24000 ~ 30000 msec: 0
Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20088 System information:
Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20089 	number of cpu:                 96
Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20090 	physical memory:               528940012 K
Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20091 	free memory:                   666056 K
Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20041 Port h: client process failure: killing process
Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure
Jul 11 13:08:05 szimsdb2 kernel: GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure
Jul 11 13:08:20 szimsdb2 kernel: GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure
Jul 11 13:08:26 szimsdb2 AgentFramework[17254]: VCS ERROR V-16-2-13027 Thread(140178848986880) Resource(vol_u01) - monitor procedure did not complete within the expected time.
Jul 11 13:08:26 szimsdb2 AgentFramework[17253]: VCS ERROR V-16-2-13027 Thread(139838914823936) Resource(VCShm) - monitor procedure did not complete within the expected time.
Jul 11 13:08:26 szimsdb2 abrt[29670]: Saved core dump of pid 16921 (/opt/VRTSvcs/bin/had) to /var/spool/abrt/ccpp-2023-07-11-13:08:26-16921 (24588288 bytes)
Jul 11 13:08:26 szimsdb2 abrtd: Directory 'ccpp-2023-07-11-13:08:26-16921' creation detected
Jul 11 13:08:27 szimsdb2 kernel: GAB WARNING V-15-1-20161 Port h client process killed, GAB will initiate regmon action syslog after 200 sec
Jul 11 13:08:27 szimsdb2 kernel: GAB INFO V-15-1-20032 Port h closed
Jul 11 13:08:27 szimsdb2 AgentFramework[17254]: VCS ERROR V-16-2-13120 Thread(140178960766752) Error receiving from the engine. Agent(Volume) is exiting.
Jul 11 13:08:27 szimsdb2 AgentFramework[17253]: VCS ERROR V-16-2-13120 Thread(139839033902880) Error receiving from the engine. Agent(HostMonitor) is exiting.
Jul 11 13:08:27 szimsdb2 AgentFramework[17247]: VCS ERROR V-16-2-13120 Thread(140346738489120) Error receiving from the engine. Agent(Mount) is exiting.
Jul 11 13:08:27 szimsdb2 AgentFramework[17248]: VCS ERROR V-16-2-13120 Thread(140104745289504) Error receiving from the engine. Agent(NIC) is exiting.
Jul 11 13:08:27 szimsdb2 AgentFramework[17252]: VCS ERROR V-16-2-13120 Thread(140492977714976) Error receiving from the engine. Agent(Oracle) is exiting.
Jul 11 13:08:27 szimsdb2 AgentFramework[17250]: VCS ERROR V-16-2-13120 Thread(140163071121184) Error receiving from the engine. Agent(Netlsnr) is exiting.
Jul 11 13:08:27 szimsdb2 AgentFramework[17246]: VCS ERROR V-16-2-13120 Thread(139647360829216) Error receiving from the engine. Agent(IP) is exiting.
Jul 11 13:08:27 szimsdb2 hashadow[16930]: VCS ERROR V-16-1-11103 VCS exited. It will restart
Jul 11 13:08:29 szimsdb2 kernel: AMF NOTICE V-292-1-67 Signal received while waiting for event on reaper 'VCSMountAgent'. Returning.
Jul 11 13:08:29 szimsdb2 kernel: AMF NOTICE V-292-1-67 Signal received while waiting for event on reaper 'VCSNetlsnrAgent'. Returning.
Jul 11 13:08:29 szimsdb2 kernel: AMF NOTICE V-292-1-67 Signal received while waiting for event on reaper 'VCSOracleAgent'. Returning.
Jul 11 13:08:33 szimsdb2 abrtd: Package 'VRTSvcs' isn't signed with proper key
Jul 11 13:08:33 szimsdb2 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2023-07-11-13:08:26-16921' exited with 1
Jul 11 13:08:33 szimsdb2 abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2023-07-11-13:08:26-16921'
Jul 11 13:08:37 szimsdb2 abrt[30169]: Saved core dump of pid 17243 (/opt/VRTSvcs/bin/Script51Agent) to /var/spool/abrt/ccpp-2023-07-11-13:08:31-17243 (15106048 bytes)
Jul 11 13:08:37 szimsdb2 kernel: AMF NOTICE V-292-1-68 The reaper 'DiskGroup' removed. Returning.
Jul 11 13:08:37 szimsdb2 abrtd: Directory 'ccpp-2023-07-11-13:08:31-17243' creation detected
Jul 11 13:08:37 szimsdb2 abrtd: Package 'VRTSvcs' isn't signed with proper key
Jul 11 13:08:37 szimsdb2 Had[30223]: VCS NOTICE V-16-1-53071 Diagnostics directory moved to /var/VRTSvcs/diag/had.1689052117, please check its contents and contact Veritas Technical Support
Jul 11 13:08:37 szimsdb2 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2023-07-11-13:08:31-17243' exited with 1
Jul 11 13:08:37 szimsdb2 abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2023-07-11-13:08:31-17243'
Jul 11 13:08:37 szimsdb2 Had[30225]: VCS NOTICE V-16-1-10619 'HAD' starting on: szimsdb2
Jul 11 13:08:37 szimsdb2 Had[30225]: VCS NOTICE V-16-1-10620 Waiting for local cluster configuration status
Jul 11 13:08:37 szimsdb2 Had[30225]: VCS NOTICE V-16-1-10625 Local cluster configuration valid
Jul 11 13:08:37 szimsdb2 Had[30225]: VCS NOTICE V-16-1-11034 Registering for cluster membership
Jul 11 13:08:37 szimsdb2 Had[30225]: VCS NOTICE V-16-1-11035 Waiting for cluster membership
Jul 11 13:08:41 szimsdb2 kernel: GAB INFO V-15-1-20036 Port h[GAB_USER_CLIENT (refcount 0)] gen   4eb71e membership 0-1
Jul 11 13:08:41 szimsdb2 Had[30225]: VCS INFO V-16-1-10077 Received new cluster membership
Jul 11 13:08:41 szimsdb2 Had[30225]: VCS NOTICE V-16-1-10086 System  (Node '0') is in Regular Membership - Membership: 0x3
Jul 11 13:08:41 szimsdb2 Had[30225]: VCS NOTICE V-16-1-10086 System szimsdb2 (Node '1') is in Regular Membership - Membership: 0x3
Jul 11 13:08:41 szimsdb2 Had[30225]: VCS NOTICE V-16-1-10075 Building from remote system
Jul 11 13:08:42 szimsdb2 Had[30225]: VCS NOTICE V-16-1-10066 Entering RUNNING state
Jul 11 13:08:42 szimsdb2 Had[30225]: VCS NOTICE V-16-1-50311 VCS Engine: running with security OFF
Jul 11 13:08:42 szimsdb2 AgentFramework[30289]: VCS NOTICE V-16-1-53071 Diagnostics directory moved to /var/VRTSvcs/diag//agents/DiskGroup.1689052122, please check its contents and contact Veritas Technical Support

Jul 11 13:11:23 szimsdb2 AgentFramework[30289]: VCS ERROR V-16-2-13027 Thread(139672226191104) Resource(dg_szimsdg) - monitor procedure did not complete within the expected time.
Jul 11 13:11:23 szimsdb2 Had[30225]: VCS ERROR V-16-2-13027 (szimsdb2) Resource(dg_szimsdg) - monitor procedure did not complete within the expected time.
Jul 11 13:11:23 szimsdb2 Had[30225]: VCS CRITICAL V-16-1-50086 CPU usage on szimsdb2 is 93%
Jul 11 13:16:53 szimsdb2 Had[30225]: VCS CRITICAL V-16-1-50086 CPU usage on szimsdb2 is 93%
Jul 11 13:20:55 szimsdb2 Had[30225]: VCS CRITICAL V-16-1-50086 CPU usage on szimsdb2 is 91%
Jul 11 13:20:55 szimsdb2 Had[30225]: VCS CRITICAL V-16-1-50086 Swap usage on szimsdb2 is 96%
Jul 11 13:22:50 szimsdb2 AgentFramework[30289]: VCS ERROR V-16-2-13027 Thread(139672227243776) Resource(dg_szimsdg) - monitor procedure did not complete within the expected time.
Jul 11 13:22:51 szimsdb2 Had[30225]: VCS ERROR V-16-2-13027 (szimsdb2) Resource(dg_szimsdg) - monitor procedure did not complete within the expected time.
Jul 11 13:25:50 anbob2 kernel: AMF WARNING V-292-1-44 AMF can no longer monitor DGoffline events. Notifying reapers.
Jul 11 13:25:50 anbob2 kernel: AMF WARNING V-292-1-44 AMF can no longer monitor DGonline events. Notifying reapers.
Jul 11 13:25:50 anbob2 imfd[22155]: IMFD ERROR V-292-2-3030 Function:oimf_getnotification from library:libusnp_vxnotify.so failed with error:Failed to read event from vxnotify. Possibly vxnotify process got killed, errno = 0
Jul 11 13:26:01 anbob2 Had[30225]: VCS ERROR V-16-2-13051 (anbob2) Agent(NIC) is exiting because another agent with process-id(30277) is already running for this type
Jul 11 13:27:21 anbob2 kernel: oracle invoked oom-killer: gfp_mask=0x200d2, order=0, oom_adj=0, oom_score_adj=0
Jul 11 13:27:21 anbob2 kernel: oracle cpuset=/ mems_allowed=0-3
Jul 11 13:27:21 anbob2 kernel: Pid: 6148, comm: oracle Tainted: P           -- ------------    2.6.32-696.el6.x86_64 #1
Jul 11 13:27:21 anbob2 kernel: Call Trace:
Jul 11 13:27:21 anbob2 kernel: [] ? dump_header+0x90/0x1b0
Jul 11 13:27:21 anbob2 kernel: [] ? security_real_capable_noaudit+0x3c/0x70
Jul 11 13:27:21 anbob2 kernel: [] ? oom_kill_process+0x82/0x2a0
Jul 11 13:27:21 anbob2 kernel: [] ? select_bad_process+0xe1/0x120
Jul 11 13:27:21 anbob2 kernel: [] ? out_of_memory+0x220/0x3c0
Jul 11 13:27:21 anbob2 kernel: [] ? __alloc_pages_nodemask+0x93c/0x950
Jul 11 13:27:21 anbob2 kernel: [] ? alloc_pages_current+0xaa/0x110
Jul 11 13:27:21 anbob2 kernel: [] ? __page_cache_alloc+0x87/0x90
Jul 11 13:27:21 anbob2 kernel: [] ? find_or_create_page+0x4f/0xb0
Jul 11 13:27:21 anbob2 kernel: [] ? vx_page_alloc+0x1d1/0xd00 [vxfs]
Jul 11 13:27:21 anbob2 kernel: [] ? vx_read_ahead_detect+0x221/0x610 [vxfs]
Jul 11 13:27:21 anbob2 kernel: [] ? vx_do_getpage+0x505/0x2490 [vxfs]
Jul 11 13:27:21 anbob2 kernel: [] ? dev_hard_start_xmit+0x21c/0x490
Jul 11 13:27:21 anbob2 kernel: [] ? vx_rwsleep_rec_lock+0x7d/0x110 [vxfs]
Jul 11 13:27:21 anbob2 kernel: [] ? vx_recsmp_trylock+0x1/0x20 [vxfs]
Jul 11 13:27:21 anbob2 kernel: [] ? vx_iglock3+0xfb/0x110 [vxfs]
Jul 11 13:27:21 anbob2 kernel: [] ? vx_getpage1+0x3f9/0x940 [vxfs]
Jul 11 13:27:21 anbob2 kernel: [] ? wake_bit_function+0x0/0x50
Jul 11 13:27:21 anbob2 kernel: [] ? vx_fault+0x2c1/0x6c0 [vxfs]
Jul 11 13:27:21 anbob2 kernel: [] ? autoremove_wake_function+0x16/0x40
Jul 11 13:27:21 anbob2 kernel: [] ? __wake_up_bit+0x31/0x40
Jul 11 13:27:21 anbob2 kernel: [] ? __do_fault+0x54/0x530
Jul 11 13:27:21 anbob2 kernel: [] ? handle_pte_fault+0xf7/0xb20
Jul 11 13:27:21 anbob2 kernel: [] ? sock_aio_read+0x1a1/0x1b0
Jul 11 13:27:21 anbob2 kernel: [] ? handle_mm_fault+0x2aa/0x3f0
Jul 11 13:27:21 anbob2 kernel: [] ? __do_page_fault+0x141/0x500
Jul 11 13:27:21 anbob2 kernel: [] ? security_file_permission+0x16/0x20
Jul 11 13:27:21 anbob2 kernel: [] ? do_page_fault+0x3e/0xa0
Jul 11 13:27:21 anbob2 kernel: [] ? page_fault+0x25/0x30
Jul 11 13:27:21 anbob2 kernel: Mem-Info:
Jul 11 13:27:21 anbob2 kernel: Node 0 DMA per-cpu:
...
Jul 11 13:27:21 anbob2 kernel: CPU   95: hi:  186, btch:  31 usd:   0
Jul 11 13:27:21 anbob2 kernel: active_anon:51373947 inactive_anon:3603515 isolated_anon:0
Jul 11 13:27:21 anbob2 kernel: active_file:270 inactive_file:77 isolated_file:610
Jul 11 13:27:21 anbob2 kernel: unevictable:594868 dirty:0 writeback:0 unstable:0
Jul 11 13:27:21 anbob2 kernel: free:164590 slab_reclaimable:212653 slab_unreclaimable:274347
Jul 11 13:27:21 anbob2 kernel: mapped:39970849 shmem:47754858 pagetables:69126167 bounce:0
Jul 11 13:27:21 anbob2 kernel: Node 0 DMA free:15744kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15192kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jul 11 13:27:21 anbob2 kernel: lowmem_reserve[]: 0 1659 128919 128919
Jul 11 13:27:21 anbob2 kernel: Node 0 DMA32 free:509480kB min:432kB low:540kB high:648kB active_anon:12kB inactive_anon:32kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):8kB present:1698848kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:0kB slab_reclaimable:19004kB slab_unreclaimable:5456kB kernel_stack:0kB pagetables:56524kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:9 all_unreclaimable? no
Jul 11 13:27:21 anbob2 kernel: lowmem_reserve[]: 0 0 127260 127260
Jul 11 13:27:21 anbob2 kernel: Node 0 Normal free:33252kB min:33284kB low:41604kB high:49924kB active_anon:41907544kB inactive_anon:3160236kB active_file:0kB inactive_file:0kB unevictable:48008kB isolated(anon):0kB isolated(file):1664kB present:130314240kB mlocked:29612kB dirty:0kB writeback:0kB mapped:33003860kB shmem:38857440kB slab_reclaimable:232004kB slab_unreclaimable:590868kB kernel_stack:35184kB pagetables:78692032kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:905 all_unreclaimable? no
Jul 11 13:27:21 anbob2 kernel: lowmem_reserve[]: 0 0 0 0
Jul 11 13:27:21 anbob2 kernel: Node 1 Normal free:32980kB min:33812kB low:42264kB high:50716kB active_anon:47390260kB inactive_anon:3399420kB active_file:708kB inactive_file:76kB unevictable:237212kB isolated(anon):0kB isolated(file):0kB present:132382720kB mlocked:75736kB dirty:0kB writeback:0kB mapped:37530640kB shmem:43887520kB slab_reclaimable:204916kB slab_unreclaimable:155872kB kernel_stack:56608kB pagetables:63008228kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1089 all_unreclaimable? yes
Jul 11 13:27:21 anbob2 kernel: lowmem_reserve[]: 0 0 0 0
Jul 11 13:27:21 anbob2 kernel: Node 2 Normal free:33780kB min:33812kB low:42264kB high:50716kB active_anon:59330284kB inactive_anon:2476360kB active_file:172kB inactive_file:356kB unevictable:25228kB isolated(anon):0kB isolated(file):768kB present:132382720kB mlocked:17052kB dirty:0kB writeback:0kB mapped:46825112kB shmem:54991776kB slab_reclaimable:196720kB slab_unreclaimable:158944kB kernel_stack:32512kB pagetables:68487804kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:874 all_unreclaimable? yes
Jul 11 13:27:21 anbob2 kernel: lowmem_reserve[]: 0 0 0 0
Jul 11 13:27:21 anbob2 kernel: Node 3 Normal free:33124kB min:33812kB low:42264kB high:50716kB active_anon:56867688kB inactive_anon:5378012kB active_file:380kB inactive_file:84kB unevictable:2069024kB isolated(anon):0kB isolated(file):0kB present:132382720kB mlocked:445680kB dirty:0kB writeback:0kB mapped:42523776kB shmem:53282696kB slab_reclaimable:197968kB slab_unreclaimable:186248kB kernel_stack:39648kB pagetables:66260080kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:625 all_unreclaimable? yes
Jul 11 13:27:21 anbob2 kernel: lowmem_reserve[]: 0 0 0 0
Jul 11 13:27:21 anbob2 kernel: Node 0 DMA: 2*4kB 1*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15744kB
Jul 11 13:27:21 anbob2 kernel: Node 0 DMA32: 522*4kB 132*8kB 56*16kB 137*32kB 263*64kB 153*128kB 89*256kB 67*512kB 20*1024kB 13*2048kB 88*4096kB = 509480kB
Jul 11 13:27:21 anbob2 kernel: Node 0 Normal: 5587*4kB 693*8kB 179*16kB 16*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 33252kB
Jul 11 13:27:21 anbob2 kernel: Node 1 Normal: 1567*4kB 505*8kB 325*16kB 186*32kB 138*64kB 5*128kB 2*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 32980kB
Jul 11 13:27:21 anbob2 kernel: Node 2 Normal: 7751*4kB 5*8kB 1*16kB 7*32kB 9*64kB 7*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 33780kB
Jul 11 13:27:21 anbob2 kernel: Node 3 Normal: 1233*4kB 406*8kB 599*16kB 322*32kB 61*64kB 7*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 33124kB
Jul 11 13:27:21 anbob2 kernel: 49341655 total pagecache pages
Jul 11 13:27:21 anbob2 kernel: 1548810 pages in swap cache
Jul 11 13:27:21 anbob2 kernel: Swap cache stats: add 12245073, delete 10696263, find 4492224732/4492556451
Jul 11 13:27:21 anbob2 kernel: Free swap  = 0kB
Jul 11 13:27:21 anbob2 kernel: Total swap = 32767996kB
Jul 11 13:27:21 anbob2 kernel: 134152191 pages RAM
Jul 11 13:27:21 anbob2 kernel: 1917188 pages reserved
Jul 11 13:27:21 anbob2 kernel: 1755985666 pages shared
Jul 11 13:27:21 anbob2 kernel: 86997652 pages non-shared

Note:
可见开始PS进程都出现过D状态.和jbd2进程hang 过200秒,很可能会影响ext4文件系统无法写入。D状态进程并不多,也不是很持久,如果较多进程D状态,有必要分析存储层驱动问题(Server hang due to “block devices” with status “pending syncing” on driver layer

— over —

打赏

对不起,这篇文章暂时关闭评论。