gpt4 book ai didi

linux - 内核挂tty子系统

转载 作者:塔克拉玛干 更新时间:2023-11-03 00:55:23 31 4
gpt4 key购买 nike

我在 RHEL 机器上遇到 tty 子系统的一些问题。从我在日志中看到的,每次生成新控制台(无论是 pts 还是 tty)时都会生成一些内核 oops。在我看来,那里似乎存在某种竞争条件。这是堆栈跟踪:

kernel:  INFO: task sshd:6338 blocked for more than 120 seconds.
kernel: Tainted: P --------------- 2.6.32-504.el6.x86_64 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: sshd D 0000000000000000 0 6338 6195 0x00000080
kernel: ffff88035be8d728 0000000000000082 0000000000000000 0000000000000000
kernel: ffff88035be8d7f8 ffffffff8105ca34 00009488ef033e83 ffff88035be8d708
kernel: ffff88035be8d880 0000000109b91c98 ffff881eea341098 ffff88035be8dfd8
kernel: Call Trace:
kernel: [<ffffffff8105ca34>] ? find_busiest_group+0x244/0x9e0
kernel: [<ffffffff8152a8c5>] schedule_timeout+0x215/0x2e0
kernel: [<ffffffff8152a543>] wait_for_common+0x123/0x180
kernel: [<ffffffff81064b90>] ? default_wake_function+0x0/0x20
kernel: [<ffffffff8152a65d>] wait_for_completion+0x1d/0x20
kernel: [<ffffffff81098bf7>] flush_work+0x77/0xc0
kernel: [<ffffffff81098460>] ? wq_barrier_func+0x0/0x20
kernel: [<ffffffff81098e14>] flush_delayed_work+0x54/0x70
kernel: [<ffffffff813392f5>] tty_flush_to_ldisc+0x15/0x20
kernel: [<ffffffff81333cc7>] n_tty_poll+0x67/0x1d0
kernel: [<ffffffff8132f80a>] tty_poll+0x8a/0xa0
kernel: [<ffffffff811a6895>] do_select+0x3c5/0x7c0
kernel: [<ffffffff8149cf18>] ? ip_finish_output+0x148/0x310
kernel: [<ffffffff811a59f0>] ? __pollwait+0x0/0xf0
kernel: [<ffffffff811a5ae0>] ? pollwake+0x0/0x60
kernel: [<ffffffff811a5ae0>] ? pollwake+0x0/0x60
kernel: [<ffffffff811a5ae0>] ? pollwake+0x0/0x60
kernel: [<ffffffff811a5ae0>] ? pollwake+0x0/0x60
kernel: [<ffffffff8152d04b>] ? _spin_unlock_bh+0x1b/0x20
kernel: [<ffffffff8144b835>] ? release_sock+0xe5/0x110
kernel: [<ffffffff814a52cc>] ? tcp_sendmsg+0x73c/0xa20
kernel: [<ffffffff8144a72b>] ? sock_aio_write+0x19b/0x1c0
kernel: [<ffffffff8133158d>] ? tty_wakeup+0x3d/0x80
kernel: [<ffffffff811a6e1a>] core_sys_select+0x18a/0x2c0
kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40
kernel: [<ffffffff811a71a7>] sys_select+0x47/0x110
kernel: [<ffffffff810e5c87>] ? audit_syscall_entry+0x1d7/0x200
kernel: [<ffffffff810e5a7e>] ? __audit_syscall_exit+0x25e/0x290
kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

因此,查看最后 2 个函数调用,似乎任务通过 schedule_timeout() 安排休眠一段时间,之后 find_busiest_group 尝试平衡该任务产生的负载。它是正确的还是我在这里遗漏了什么?

谢谢。

最佳答案

如果有人感兴趣,我向 RedHat 提交了一个案例,问题似乎与阵列 Controller (hpsa) 中的 HP 固件错误有关。更多详情:https://access.redhat.com/solutions/1179703

**> 我查看了 bugzilla,发现该错误可能与

Smart Array firmware revision.

The bug was reproduced on system with below combination of controller version and hpsa module version. (New hpsa module version and old firmware version)

  • kmod-hpsa: 3.4.4-1-RH1
  • SA Firmware: 3.22
  • Controller: P220i (103c:323b 103c:3355 rev 01)

The system running new hpsa module version and new firmware version did not reproduced this bug.

  • kmod-hpsa: 3.4.4-1-RH1
  • SA Firmware: 3.42
  • Controller: P220i (103c:323b 103c:3355 rev 01)

The system running old hpsa module version and old firmware version also did not reproduced this bug.

  • kmod-hpsa: 3.4.0-1-RH1
  • SA Firmware: 3.22
  • Controller: P220i (103c:323b 103c:3355 rev 01)

In our case the controller firmware version is 3.22 and we were using new hpsa module version 3.4.4-1-RH2.

$ cat proc/scsi/scsi | grep -A 5 P220i Vendor: HP Model: P220i Rev: 3.22 Type: RAID
ANSI SCSI revision: 05

Now I see that with old kernel we are using old version of hpsa module (3.4.0-1-RH1). With this the system should not encounter this bug.

modinfo hpsa filename: /lib/modules/2.6.32-431.23.3.el6.x86_64/kernel/drivers/scsi/hpsa.ko

license: GPL version: 3.4.0-1-RH1 description: Driver for HP Smart Array Controller version 3.4.0-1-RH1 author:
Hewlett-Packard Company**

这是 RedHat 工程师的说法。

关于linux - 内核挂tty子系统,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26628274/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com