How to fix hung task timeout secs and blocked for more than

               

Author:Skate
Time:2015/03/04

How to fix hung_task_timeout_secs and blocked for more than 120 seconds problem

现象:系统hang住,可以ping通,但ssh无响应

查看message log
[1379100.801689] [<ffffffff81536f95>] page_fault+0x25/0x30
[1379100.801693] INFO: task java:710923 blocked for more than 120 seconds.
[1379100.801766] Not tainted 2.6.32-042stab104.1 #1
[1379100.801835] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1379100.801963] java D ffff8800372d7200 0 710923 709954 67084186 0x00000000
[1379100.801968] ffff880e57e71cf0 0000000000000082 ffffea00021a8fc0 ffff880e57e71c68
[1379100.801972] ffffffff81155c60 ffff8800372d7200 ffffea00021a8fc0 ffff88100c409638
[1379100.801976] 00000007fa23bffc ffff880e57e71c78 ffffffff81155cd1 ffff880e57e71ca8
[1379100.801980] Call Trace:
[1379100.801984] [<ffffffff81155c60>] ? __lru_cache_add+0x40/0x90
[1379100.801988] [<ffffffff81155cd1>] ? lru_cache_add_lru+0x21/0x40
[1379100.801992] [<ffffffff81172c9c>] ? handle_pte_fault+0x65c/0x1040
[1379100.801996] [<ffffffff81536705>] rwsem_down_failed_common+0x95/0x1d0
[1379100.802000] [<ffffffff81536896>] rwsem_down_read_failed+0x26/0x30
[1379100.802004] [<ffffffff812a6a34>] call_rwsem_down_read_failed+0x14/0x30
[1379100.802008] [<ffffffff81535d94>] ? down_read+0x24/0x30
[1379100.802011] [<ffffffff8104dffe>] __do_page_fault+0x18e/0x480
[1379100.802015] [<ffffffff8106f0c8>] ? finish_task_switch+0xc8/0x120
[1379100.802019] [<ffffffff81539c2e>] do_page_fault+0x3e/0xa0
[1379100.802022] [<ffffffff81536f95>] page_fault+0x25/0x30
Show  Vitaly Medvedev added a comment - Yesterday 10:34 PM [1379100.801682] [<ffffffff81015019>] ? read_tsc+0x9/0x20 [1379100.801685] [<ffffffff81539c2e>] do_page_fault+0x3e/0xa0 [1379100.801689] [<ffffffff81536f95>] page_fault+0x25/0x30 [1379100.801693] INFO: task java:710923 blocked for more than 120 seconds. [1379100.801766] Not tainted 2.6.32-042stab104.1 #1 [1379100.801835] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1379100.801963] java D ffff8800372d7200 0 710923 709954 67084186 0x00000000 [1379100.801968] ffff880e57e71cf0 0000000000000082 ffffea00021a8fc0 ffff880e57e71c68 [1379100.801972] ffffffff81155c60 ffff8800372d7200 ffffea00021a8fc0 ffff88100c409638 [1379100.801976] 00000007fa23bffc ffff880e57e71c78 ffffffff81155cd1 ffff880e57e71ca8 [1379100.801980] Call Trace: [1379100.801984] [<ffffffff81155c60>] ? __lru_cache_add+0x40/0x90 [1379100.801988] [<ffffffff81155cd1>] ? lru_cache_add_lru+0x21/0x40 [1379100.801992] [<ffffffff81172c9c>] ? handle_pte_fault+0x65c/0x1040 [1379100.801996] [<ffffffff81536705>] rwsem_down_failed_common+0x95/0x1d0 [1379100.802000] [<ffffffff81536896>] rwsem_down_read_failed+0x26/0x30 [1379100.802004] [<ffffffff812a6a34>] call_rwsem_down_read_failed+0x14/0x30 [1379100.802008] [<ffffffff81535d94>] ? down_read+0x24/0x30 [1379100.802011] [<ffffffff8104dffe>] __do_page_fault+0x18e/0x480 [1379100.802015] [<ffffffff8106f0c8>] ? finish_task_switch+0xc8/0x120 [1379100.802019] [<ffffffff81539c2e>] do_page_fault+0x3e/0xa0 [1379100.802022] [<ffffffff81536f95>] page_fault+0x25/0x30


宿主机的load达到460左右

By default Linux uses up to 40% of the available memory for file system caching.
After this mark has been reached the file system flushes all outstanding data to
disk causing all following IOs going synchronous. For flushing out this data to
disk this there is a time limit of 120 seconds by default. In the case here the
IO subsystem is not fast enough to flush the data withing 120 seconds. As IO
subsystem responds slowly and more requests are served, System Memory gets filled
up resulting in the above error, thus serving HTTP requests.


解决方案:

1. 修改参数 vm.dirty_ratio 和 vm.dirty_backgroud_ratio 可以避免这个问题

# sysctl -w vm.dirty_ratio=10
# sysctl -w vm.dirty_background_ratio=5

扫描二维码关注公众号,回复: 5144034 查看本文章

立即生效:
# sysctl -p

永久修改(需要reboot生效):
# vi /etc/sysctl.conf
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10

2.找到好资源的进程,然后对其优化


参考:http://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/


-------end-------

           

再分享一下我老师大神的人工智能教程吧。零基础!通俗易懂!风趣幽默!还带黄段子!希望你也加入到我们人工智能的队伍中来!https://blog.csdn.net/jiangjunshow

猜你喜欢

转载自blog.csdn.net/ffuygggh/article/details/86744310