my33_内存满导致mysqld被kill

监控报警发现MGR的一个节点故障,查看时发现LVS已经发生切换,LVS切到了MGR新的写节点上了,排查原因

/var/log/message

Mar 27 16:51:05 db10 kernel: crond invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0
Mar 27 16:51:05 db10 kernel: crond cpuset=/ mems_allowed=0-1
Mar 27 16:51:05 db10 kernel: CPU: 35 PID: 12090 Comm: crond Tainted: G           OE  ------------   3.10.0-693.21.1.el7.x86_64 #1
Mar 27 16:51:05 db10 kernel: Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS 4.1.16 06/21/2018
Mar 27 16:51:05 db10 kernel: Call Trace:
Mar 27 16:51:05 db10 kernel: [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
Mar 27 16:51:05 db10 kernel: [<ffffffff816a9b90>] dump_header+0x90/0x229
Mar 27 16:51:05 db10 kernel: [<ffffffff810ecec2>] ? ktime_get_ts64+0x52/0xf0
Mar 27 16:51:05 db10 kernel: [<ffffffff8114140f>] ? delayacct_end+0x8f/0xb0
Mar 27 16:51:05 db10 kernel: [<ffffffff8118a884>] oom_kill_process+0x254/0x3d0
Mar 27 16:51:05 db10 kernel: [<ffffffff8118a32d>] ? oom_unkillable_task+0xcd/0x120
Mar 27 16:51:05 db10 kernel: [<ffffffff8118a3d6>] ? find_lock_task_mm+0x56/0xc0
Mar 27 16:51:05 db10 kernel: [<ffffffff8118b0c6>] out_of_memory+0x4b6/0x4f0
Mar 27 16:51:05 db10 kernel: [<ffffffff816aa694>] __alloc_pages_slowpath+0x5d6/0x724
Mar 27 16:51:05 db10 kernel: [<ffffffff811912a5>] __alloc_pages_nodemask+0x405/0x420
Mar 27 16:51:05 db10 kernel: [<ffffffff8108859d>] copy_process+0x1dd/0x1970
Mar 27 16:51:05 db10 kernel: [<ffffffff81121930>] ? audit_filter_rules.isra.8+0x280/0xf90
Mar 27 16:51:05 db10 kernel: [<ffffffff81089ee1>] do_fork+0x91/0x320
Mar 27 16:51:05 db10 kernel: [<ffffffff8108a1f6>] SyS_clone+0x16/0x20
Mar 27 16:51:05 db10 kernel: [<ffffffff816c0ad4>] stub_clone+0x44/0x70
Mar 27 16:51:05 db10 kernel: [<ffffffff816c0715>] ? system_call_fastpath+0x1c/0x21
Mar 27 16:51:05 db10 kernel: Mem-Info:
Mar 27 16:51:05 db10 kernel: active_anon:32289123 inactive_anon:180550 isolated_anon:0#012 active_file:960 inactive_file:195 isolated_file:0#012 unevictable:0 dirty:4
8 writeback:0 unstable:0#012 slab_reclaimable:59079 slab_unreclaimable:32778#012 mapped:13096 shmem:534843 pagetables:66034 bounce:0#012 free:96590 free_pcp:105 free_cma:0
Mar 27 16:51:05 db10 kernel: Node 0 DMA free:13540kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB iso
lated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kern
el_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Mar 27 16:51:05 db10 kernel: lowmem_reserve[]: 0 1680 64143 64143
Mar 27 16:51:05 db10 kernel: Node 0 DMA32 free:250600kB min:1176kB low:1468kB high:1764kB active_anon:1442100kB inactive_anon:464kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1934208kB managed:1722948kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:1740kB slab_reclaimable:11840
kB slab_unreclaimable:7640kB kernel_stack:368kB pagetables:1132kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unre
claimable? yes
Mar 27 16:51:05 db10 kernel: lowmem_reserve[]: 0 0 62462 62462
Mar 27 16:51:05 db10 kernel: Node 0 Normal free:54592kB min:43744kB low:54680kB high:65616kB active_anon:62871276kB inactive_anon:371740kB active_file:12kB inactive_f
ile:24kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:65011712kB managed:63961888kB mlocked:0kB dirty:0kB writeback:0kB mapped:1028kB shmem:1190332kB slab_
reclaimable:124084kB slab_unreclaimable:45492kB kernel_stack:4768kB pagetables:92984kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Mar 27 16:51:05 db10 kernel: lowmem_reserve[]: 0 0 0 0
Mar 27 16:51:05 db10 kernel: Node 1 Normal free:68040kB min:45176kB low:56468kB high:67764kB active_anon:64843172kB inactive_anon:349996kB active_file:0kB inactive_file:160kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:67108864kB managed:66056756kB mlocked:0kB dirty:192kB writeback:0kB mapped:50080kB shmem:947300kB slab_reclaimable:100392kB slab_unreclaimable:77980kB kernel_stack:28736kB pagetables:170020kB unstable:0kB bounce:0kB free_pcp:640kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:55 all_unreclaimable? no
Mar 27 16:51:05 db10 kernel: lowmem_reserve[]: 0 0 0 0
Mar 27 16:51:05 db10 kernel: Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 2*2048kB (UM) 2*4096kB (M) = 13540kB
Mar 27 16:51:05 db10 kernel: Node 0 DMA32: 264*4kB (UEM) 403*8kB (UEM) 475*16kB (UEM) 342*32kB (UEM) 391*64kB (UEM) 300*128kB (UEM) 208*256kB (UEM) 107*512kB (UEM) 45*1024kB (EM) 5*2048kB (E) 0*4096kB = 250600kB
Mar 27 16:51:05 db10 kernel: Node 0 Normal: 13593*4kB (UEM) 22*8kB (UM) 9*16kB (M) 2*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54756kB
Mar 27 16:51:05 db10 kernel: Node 1 Normal: 16649*4kB (UEM) 8*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 66660kB
Mar 27 16:51:05 db10 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar 27 16:51:05 db10 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar 27 16:51:05 db10 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar 27 16:51:05 db10 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar 27 16:51:05 db10 kernel: 535067 total pagecache pages
Mar 27 16:51:05 db10 kernel: 0 pages in swap cache
Mar 27 16:51:05 db10 kernel: Swap cache stats: add 0, delete 0, find 0/0
Mar 27 16:51:05 db10 kernel: Free swap  = 0kB
Mar 27 16:51:05 db10 kernel: Total swap = 0kB
Mar 27 16:51:05 db10 kernel: 33517692 pages RAM
Mar 27 16:51:05 db10 kernel: 0 pages HighMem/MovableOnly
Mar 27 16:51:05 db10 kernel: 578319 pages reserved
Mar 27 16:51:05 db10 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Mar 27 16:51:05 db10 kernel: [ 6050]     0  6050    35461    19476      75        0             0 systemd-journal
Mar 27 16:51:05 db10 kernel: [ 6075]     0  6075    30235       80      28        0             0 lvmetad
Mar 27 16:51:05 db10 kernel: [ 6094]     0  6094    10898      172      24        0         -1000 systemd-udevd
Mar 27 16:51:05 db10 kernel: [11985]     0 11985     4845      104      15        0             0 irqbalance
Mar 27 16:51:05 db10 kernel: [11988]   995 11988    25173       71      20        0             0 chronyd
Mar 27 16:51:06 db10 kernel: [11989]    81 11989     6709      161      21        0          -900 dbus-daemon
Mar 27 16:51:06 db10 kernel: [12004]     0 12004    31998      151      22        0             0 smartd
Mar 27 16:51:06 db10 kernel: [12006]   996 12006     2144       37      10        0             0 lsmd
Mar 27 16:51:06 db10 kernel: [12009]     0 12009   186971     9901     237        0             0 rsyslogd
Mar 27 16:51:06 db10 kernel: [12016]     0 12016     1105       39       8        0             0 rngd
Mar 27 16:51:06 db10 kernel: [12034]     0 12034     6620       99      19        0             0 systemd-logind
Mar 27 16:51:06 db10 kernel: [12068]     0 12068     5955       48      17        0             0 atd
Mar 27 16:51:06 db10 kernel: [12090]     0 12090    31058      165      19        0             0 crond
Mar 27 16:51:06 db10 kernel: [12242]     0 12242     1055       19       7        0             0 supervise
Mar 27 16:51:06 db10 kernel: [12243]     0 12243    28807       54      14        0             0 run
Mar 27 16:51:06 db10 kernel: [12260]     0 12260   139002     3217      93        0             0 tuned
Mar 27 16:51:06 db10 kernel: [12273]     0 12273    27021      242      54        0         -1000 sshd
Mar 27 16:51:06 db10 kernel: [12316]     0 12316    27523       33      10        0             0 agetty
Mar 27 16:51:06 db10 kernel: [12319]     0 12319    20378      199      38        0             0 hooagentd
Mar 27 16:51:06 db10 kernel: [12324]     0 12324    80468      586      57        0             0 hooagent
Mar 27 16:51:06 db10 kernel: [12804]     0 12804    22895      259      43        0             0 master
Mar 27 16:51:06 db10 kernel: [12831]    89 12831    22965      281      45        0             0 qmgr
Mar 27 16:51:06 db10 kernel: [13103]     0 13103   828994     4025     115        0             0 wonder-agent
Mar 27 16:51:06 db10 kernel: [20985]     0 20985   175106     1241      72        0         -1000 logmon
Mar 27 16:51:06 db10 kernel: [18570] 42583 18570    32515      159      19        0             0 screen
Mar 27 16:51:06 db10 kernel: [18571] 42583 18571    29229      485      15        0             0 bash
Mar 27 16:51:06 db10 kernel: [22385] 42583 22385    32515      153      19        0             0 screen
Mar 27 16:51:06 db10 kernel: [22386] 42583 22386    29230      485      16        0             0 bash
Mar 27 16:51:06 db10 kernel: [22416] 42583 22416    32515      154      20        0             0 screen
Mar 27 16:51:06 db10 kernel: [22417] 42583 22417    29230      485      13        0             0 bash
Mar 27 16:51:06 db10 kernel: [12032]     0 12032    28326      102      13        0             0 mysqld_safe
Mar 27 16:51:06 db10 kernel: [13363] 33173 13363 74431932 31903076   64367        0             0 mysqld
Mar 27 16:51:06 db10 kernel: [33949]     0 33949    14918     7466      33        0             0 mysqld_exporter
Mar 27 16:51:06 db10 kernel: [ 6287]     0  6287   663221     5068     121        0             0 bbmon
Mar 27 16:51:06 db10 kernel: [ 6621]    89  6621    22921      255      46        0             0 pickup
Mar 27 16:51:06 db10 kernel: [ 6957]    89  6957    22922      256      44        0             0 trivial-rewrite
Mar 27 16:51:06 db10 kernel: [ 7033]     0  7033    45072      238      45        0             0 crond
Mar 27 16:51:06 db10 kernel: [ 7045]     0  7045    28274       48      13        0             0 sh
Mar 27 16:51:06 db10 kernel: [ 7054]     0  7054   372238     1382      69        0             0 dbvip
Mar 27 16:51:06 db10 kernel: [ 7421]     0  7421    47770     1426      49        0             0 python
Mar 27 16:51:06 db10 kernel: [ 7422]     0  7422     4935      159      12        0             0 msval
Mar 27 16:51:06 db10 kernel: Out of memory: Kill process 5396 (mysqld) score 970 or sacrifice child
Mar 27 16:51:06 db10 kernel: Killed process 13363 (mysqld) total-vm:297727728kB, anon-rss:127612364kB, file-rss:0kB, shmem-rss:0kB

直接原因是下面这个mysqld进程被杀

Mar 27 16:51:06 db10 kernel: Killed process 13363 (mysqld) total-vm:297727728kB, anon-rss:127612364kB, file-rss:0kB, shmem-rss:0kB

然后往上面看,mysqld占用的内存是70多G,系统物理内存是128G

Mar 27 16:51:06 db10 kernel: [13363] 33173 13363 74431932 31903076   64367        0             0 mysqld

再往上看涉及到了node0、node1、hugepages_total,swap,这主要是numa和大页相关,先跳过这两个问题,既然这里是70多Gmysqld就被kill掉了,那我先设置mysqlbuffer_pool为 64G,先为防止该问题再出现加一道保险,然后再慢慢排查

mysql> show variables like '%pool_size%';
+-------------------------+-------------+
| Variable_name           | Value       |
+-------------------------+-------------+
| innodb_buffer_pool_size | 85899345920 |
+-------------------------+-------------+
1 row in set (0.00 sec)

mysql> select 64*1024*1024*1024;
+-------------------+
| 64*1024*1024*1024 |
+-------------------+
|       68719476736 |
+-------------------+
1 row in set (0.00 sec)

mysql> 
mysql> 
mysql> set global innodb_buffer_pool_size=68719476736;
Query OK, 0 rows affected (0.00 sec)

mysql> show global variables like '%pool_size%';
+-------------------------+-------------+
| Variable_name           | Value       |
+-------------------------+-------------+
| innodb_buffer_pool_size | 68719476736 |
+-------------------------+-------------+
1 row in set (0.00 sec)

 注意,配置文件也要修改一下;修改后OS会慢慢释放一些内存,当然,那些正在使用内存不会被释放。

猜你喜欢

转载自www.cnblogs.com/perfei/p/10609556.html