gpt4 book ai didi

linux - 信息 : task v8:SweeperThrea: blocked for more than 120 seconds

转载 作者:太空宇宙 更新时间:2023-11-04 04:02:14 25 4
gpt4 key购买 nike

当在容器中运行 Node 一个简单的 Node 进程时,我在内核日志中看到这一点,并且该进程已失效:

Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745710] INFO: task v8:SweeperThrea:2569 blocked for more than 120 seconds.
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745723] v8:SweeperThrea D 0000000000000000 0 2569 2470 0x00000002
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745727] ffff8801d228fca8 0000000000000246 ffff8801d0fb1740 ffff8801d228ffd8
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745730] ffff8801d228ffd8 ffff8801d228ffd8 ffffffff81c15440 ffff8801d0fb1740
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745737] ffff8801d0fb1740 ffff8801d02f8878 0000000000000002 ffff8801d0fb1740
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745741] Call Trace:
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745745] [<ffffffff816ca029>] schedule+0x29/0x70
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745748] [<ffffffff810d1365>] zap_pid_ns_processes+0x125/0x180
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745752] [<ffffffff8105e91c>] do_exit+0x85c/0x9d0
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745755] [<ffffffff8105eb0f>] do_group_exit+0x3f/0xa0
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745759] [<ffffffff8106e571>] get_signal_to_deliver+0x1c1/0x610
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745764] [<ffffffff8101439f>] do_signal+0x3f/0x8d0
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745767] [<ffffffff810f3427>] ? call_rcu_sched+0x17/0x20
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745771] [<ffffffff8108429f>] ? __put_cred+0x3f/0x50
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745775] [<ffffffff810843b9>] ? abort_creds+0x29/0x30
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745779] [<ffffffff81014cb0>] do_notify_resume+0x80/0xb0
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745781] [<ffffffff816d3a9a>] int_signal+0x12/0x17

更多信息:docker 正在对进程执行 SIGTERM 然后 SIGKILL。

docker -d -D 日志:

[debug] server.go:924 Calling POST /containers/create
2014/03/21 19:04:11 POST /v1.10/containers/create
[/var/lib/docker|90a3fa34] +job create()
[/var/lib/docker|90a3fa34] -job create() = OK (0)
[debug] server.go:924 Calling POST /containers/{name:.*}/start
2014/03/21 19:04:11 POST /v1.10/containers/074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766/start
[/var/lib/docker|90a3fa34] +job start(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766)
[/var/lib/docker|90a3fa34] +job allocate_interface(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766)
[/var/lib/docker|90a3fa34] -job allocate_interface(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766) = OK (0)
[/var/lib/docker|90a3fa34] -job start(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766) = OK (0)
[debug] server.go:924 Calling POST /containers/{name:.*}/stop
2014/03/21 19:04:16 POST /v1.10/containers/074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766/stop?t=10
[/var/lib/docker|90a3fa34] +job stop(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766)
2014/03/21 19:04:26 Container 074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766 failed to exit within 10 seconds of SIGTERM - using the force
2014/03/21 19:04:36 Container SIGKILL failed to exit within 10 seconds of lxc-kill 074dda12cb03 - trying direct SIGKILL

pstree:

docker(2387)─node(2532)─{node}(2569)

ps 辅助

F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
4 S 0 2387 2386 0 80 0 - 153471 futex_ pts/0 00:00:07 docker
4 Z 0 2532 2387 0 80 0 - 0 exit ? 00:00:00 node <defunct>

/proc/2532/fd 中没有文件/proc/2532//task/2569/fd/中没有文件

来自/proc/2532/stack的堆栈

[<ffffffff8105e763>] do_exit+0x6a3/0x9d0
[<ffffffff8105eb0f>] do_group_exit+0x3f/0xa0
[<ffffffff8105eb87>] sys_exit_group+0x17/0x20
[<ffffffff816d37dd>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff

来自/proc/2532/task/2569/stack的堆栈

[<ffffffff810d1365>] zap_pid_ns_processes+0x125/0x180
[<ffffffff8105e91c>] do_exit+0x85c/0x9d0
[<ffffffff8105eb0f>] do_group_exit+0x3f/0xa0
[<ffffffff8106e571>] get_signal_to_deliver+0x1c1/0x610
[<ffffffff8101439f>] do_signal+0x3f/0x8d0
[<ffffffff81014cb0>] do_notify_resume+0x80/0xb0
[<ffffffff816d3a9a>] int_signal+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff

重现脚本:

CNT=0
while true
do
echo $CNT
DOCK=$(sudo docker run -d -t anandkumarpatel/zombie_bug ./node index.js)
sleep 60 && sudo docker stop $DOCK > out.log &
sleep 1
CNT=$(($CNT+1))
if [[ "$CNT" == "50" ]]; then
exit
fi
done

可以在 Pastebin 上找到失败的终止过程中 docker deamon 的 strace:

http://pastebin.com/HxDwiRBW

我的系统信息和版本。我正在使用自定义构建 docker,但它是 0.9.0 版本的 fork 版本,带有 2 个小补丁。但这也将在干净的 0.9.0 版本上重现。

$  lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 13.04
Release: 13.04
Codename: raring

$ sudo docker version
Client version: 0.9.0
Go version (client): go1.2.1
Git commit (client): 70f72ea
Server version: 0.9.0
Git commit (server): 70f72ea
Go version (server): go1.2.1
Last stable version: 0.9.0

$ uname -a
Linux ip-10-0-2-233 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

但是,这并不适用于每个系统!由于某种原因,它仅在我们的生产服务器和另一台服务器上重现。它们都有相似的配置,但 ubuntu 版本相同。

如果有更多信息需要收集,请告诉我,我会获取。我在测试系统上的重现率接近 100%,因此我可以收集所需的一切。

相关: https://github.com/dotcloud/docker/issues/4811
Docker container refuses to get killed after run command turns into a zombie
更改为较新的内核不会重现此问题:不重现:linux-image-3.8.0-35-genericlinux-image-3.8.0-19-generic 的重现

将进行搜索以查看此问题何时得到修复,以查看是否有助于找到根本原因。

最佳答案

更改为最新内核修复了该问题

发现确切的内核差异:重现:linux-image-3.8.0-31-generic无法重现:linux-image-3.8.0-32-generic

我认为这就是解决办法:

+++ linux-3.8.0/kernel/pid_namespace.c
@@ -181,6 +181,7 @@
int nr;
int rc;
struct task_struct *task, *me = current;
+ int init_pids = thread_group_leader(me) ? 1 : 2;

/* Don't allow any more processes into the pid namespace */
disable_pid_allocation(pid_ns);
@@ -230,7 +231,7 @@
*/
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
- if (pid_ns->nr_hashed == 1)
+ if (pid_ns->nr_hashed == init_pids)
break;
schedule();
}

来自这里: https://groups.google.com/forum/#!msg/fa.linux.kernel/u4b3n4oYDQ4/GuLrXfDIYggJ

将升级我们所有重现此问题的服务器,看看它是否仍然发生。

关于linux - 信息 : task v8:SweeperThrea:<pid> blocked for more than 120 seconds,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22588254/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com