首页 » 系统相关 » Troubleshooting Linux7 panic System crash shows exception RIP: pagetypeinfo_showfree_print

Troubleshooting Linux7 panic System crash shows exception RIP: pagetypeinfo_showfree_print

最近一套oracle RAC on Linux 7环境1节点操作系统重启,分析又是DB和CRS层无错误日志,还好OS有配置kdump, 生成了vmcore文件, 分析是在cat命令时触发操作系统panic, cpu 遭遇hard lockup,出现system crash.  调用堆栈显示exception RIP pagetypeinfo_showfree_print

错误日志堆栈

crash> bt
PID: 27901  TASK: ffff938a4d4f1fa0  CPU: 14   COMMAND: "cat"
 #0 [ffff9483bf488e48] crash_nmi_callback at ffffffffb8c551d7
 #1 [ffff9483bf488e58] nmi_handle at ffffffffb931d8cc
 #2 [ffff9483bf488eb0] do_nmi at ffffffffb931dba8
 #3 [ffff9483bf488ef0] end_repeat_nmi at ffffffffb931cd69
    [exception RIP: pagetypeinfo_showfree_print+104]
    RIP: ffffffffb8db7173  RSP: ffff938b9fcbfda0  RFLAGS: 00000006
    RAX: fffff0c9946d7020  RBX: ffff96073ffd5528  RCX: 0000000000000000
    RDX: 00000000001c7764  RSI: ffffffffb9676ab1  RDI: 0000000000000000
    RBP: ffff938b9fcbfdd0   R8: 000000000000000a   R9: 00000000fffffffe
    R10: 0000000000000000  R11: ffff938b9fcbfc36  R12: ffff942b97758240
    R13: ffffffffb942f730  R14: ffff96073ffd5000  R15: ffff96073ffd5180
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
---  ---
 #4 [ffff938b9fcbfda0] pagetypeinfo_showfree_print at ffffffffb8db7173
 #5 [ffff938b9fcbfdd8] walk_zones_in_node at ffffffffb8db74df
 #6 [ffff938b9fcbfe20] pagetypeinfo_show at ffffffffb8db7a29
 #7 [ffff938b9fcbfe48] seq_read at ffffffffb8e45c3c
 #8 [ffff938b9fcbfeb8] proc_reg_read at ffffffffb8e95070
 #9 [ffff938b9fcbfed8] vfs_read at ffffffffb8e1f2af
#10 [ffff938b9fcbff08] sys_read at ffffffffb8e2017f
#11 [ffff938b9fcbff50] system_call_fastpath at ffffffffb932579b

这个环境 linux 7.7,发现在Oracle linux和Red hat linux都存在该问题,因为分支不同的原因,命名的bug不同,升级的内核不同。

对于OEL 属于Bug 32921246 – [UEK-5-U5] Reading /proc/pagetypeinfo on large systems can cause lockup

对于RHEL  属于Bug 1757943 – Hard lockup in free_one_page()->_raw_spin_lock() because sosreport command is reading from /proc/pagetypeinfo

Oracle Linux 没有公开原因,只是在Doc ID 3000138.1 记录了UEK kernel 5存在该问题的现象, Red hat Linux 记录原因是cat /proc/pagetypeinfo 读取 free pages时遇到的循环调用walk_zones_in_node, 触发hard lockup的内核bug.

    937 /* Print out the free pages at each order for each migatetype */
    938 static int pagetypeinfo_showfree(struct seq_file *m, void *arg)
    939 {
    940         int order;
    941         pg_data_t *pgdat = (pg_data_t *)arg;
    942 
    943         /* Print header */
    944         seq_printf(m, "%-43s ", "Free pages count per migrate type at order");
    945         for (order = 0; order < MAX_ORDER; ++order)
    946                 seq_printf(m, "%6d ", order);
    947         seq_putc(m, '\n');
    948 
    949         walk_zones_in_node(m, pgdat, pagetypeinfo_showfree_print);  <-----
    950 
    951         return 0;
    952 }

    709 /* Walk all the zones in a node and print using a callback */
    710 static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat,
    711                 void (*print)(struct seq_file *m, pg_data_t *, struct zone *))
{
...
}

什么是/proc/pagetypeinfo

Linux系统中的/proc/pagetypeinfo条目提供有关内存页面分配和使用情况的信息。它可以深入了解正在使用的不同类型的内存页,例如活动的、非活动的、空闲的等等。这有助于系统管理员和开发人员了解系统的内存使用模式并优化性能。常用于分析内存碎片memory fragmentation, 通常还会和/proc/zoneinfo 、/proc/buddyinfo、 /proc/vmstat文件一起查看。现在的LINUX内核中,内存管理最大概念为node,在node上再分为一个或者几个zone, 每个zone中又分为不同的迁移类型.pagetypeinfo输出系统上各个zone中的不同迁移类型的详细状态信息,其比/proc/buddyinfo 中的信息更加详细

The Linux Kernel splits its memory space in Zones (eg, for x86_64):

from https://github.com/netdata/netdata/issues/6802

DMA : @ 0 to 16MB, for legacy reasons
DMA32 : @ 16MB to 4GB, for 32bits hardware
Normal: @4GB to ..., the standard addressing.
Each of these zones is split in pages of 2^10 (1MB for 4KB pagesize) by the buddyallocator.

When a page is released, the allocator will try to merge it with its buddy to form a higher page.

If all pages are low-level pages, it often denotes memory fragmentation. Most of the time, this is due to the kernel cache that uses unmovable pages. You can clean the most consumed (inode & dentries) by issuing a "echo 3 >/proc/sys/vm/drop_caches".

The

A "clean" (ie non-fragmented) machine will have high order pages (8, 9, 10) :

odin [00:13:12][0][~] cat /proc/pagetypeinfo 
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10 
Node    0, zone      DMA, type    Unmovable      0      0      0      0      2      1      1      0      1      0      0 
Node    0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      1      3 
Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone      DMA, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone      DMA, type          CMA      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type    Unmovable      1      0      0      0      0      0      1      1      1      1      0 
Node    0, zone    DMA32, type      Movable      2      1      2      0      1      3      2      1      2      1    593 
Node    0, zone    DMA32, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type          CMA      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone   Normal, type    Unmovable    870    530    391    157    103     41      9      2      1      0      0 
Node    0, zone   Normal, type      Movable   5886   9235   5728   4072   1561    324    115     41     12      4  13018 
Node    0, zone   Normal, type  Reclaimable      3      4      8     11      2      3      1      1      1      0      0 
Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 

while a more fragmented server will have mostly low-order pages:
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10 
Node    0, zone      DMA, type    Unmovable      0      1      1      0      2      1      1      0      1      0      0 
Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      1      3 
Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone      DMA, type          CMA      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type    Unmovable    159      6      2      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type  Reclaimable      9   8271   6716      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type      Movable    589   8078   3128      9      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type      Reserve      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type          CMA      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone   Normal, type    Unmovable   1373   1465   1173      2      0      0      0      0      0      0      0 
Node    0, zone   Normal, type  Reclaimable     14      5     13      0      0      0      0      0      0      0      0 
Node    0, zone   Normal, type      Movable  16256  80265 156907    529     67      0      0      0      0      0      0 
Node    0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 

If you want more details, you can see my memory management presentation

解决方法

Red Hat Enterprise Linux 7

Red Hat Enterprise Linux 7.7

Red Hat Enterprise Linux 7.6

Red Hat Enterprise Linux 7.5

— or —

Oracle Linux UEK 

This bug is fixed in “V4.14.35-2047.505.1” and above.

More about  Oracle Linux and Unbreakable Enterprise Kernel (UEK) Releases

References
Oracle Linux: CPU Hard Lockup Detected in get_page_from_freelist() Call of UEK5 Kernel (Doc ID 3000138.1)
https://access.redhat.com/solutions/4588841

打赏

,

对不起,这篇文章暂时关闭评论。