gpt4 book ai didi

linux - 即使异步 I/O 操作挂起,也只有处理 io_service 的线程在等待

转载 作者:IT王子 更新时间:2023-10-29 00:51:41 26 4
gpt4 key购买 nike

Boost 的 ASIO 调度器似乎有一个严重的问题,我似乎找不到解决方法。症状是唯一等待分派(dispatch)的线程留在 pthread_cond_wait feven 尽管有 I/O 操作挂起需要它在 epoll_wait 中阻塞。

我可以通过让一个线程在循环中调用 poll_one 直到它返回零来最轻松地重现这个问题。这会使调用 run 的线程卡在 pthread_cond_wait 中,而调用 poll_one 的线程会跳出循环。据推测,io_service 期望该线程返回并在 epoll_wait 中阻塞,但它没有义务这样做,而且这种期望似乎是致命的。

是否要求线程与 io_service 静态关联?

这是一个显示死锁的示例。这是处理此 io_service 的唯一线程,因为其他线程已移动。肯定有套接字操作挂起:

#0 pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (...) at /usr/include/boost/asio/detail/posix_event.hpp:80
#2 boost::asio::detail::task_io_service::do_run_one (...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:405
#3 boost::asio::detail::task_io_service::run (...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:146

我认为错误如下:如果为 I/O 队列提供服务的线程是阻塞在 I/O 套接字就绪检查上的线程,并且它调用调度函数,如果有任何其他线程阻塞在io服务,它必须发出信号。它目前仅在当时有准备运行的处理程序时发出信号。但是这样就没有线程检查套接字就绪情况。

最佳答案

这是一个错误。我已经能够通过在 task_io_service::do_poll_one 的非关键部分添加延迟来复制它。这是 booost/asio/detail/impl/task_io_service.ipp 中修改后的 task_io_service::do_poll_one() 的片段.唯一添加的行是 sleep 。

std::size_t task_io_service::do_poll_one(mutex::scoped_lock& lock,
task_io_service::thread_info& this_thread,
const boost::system::error_code& ec)
{
if (stopped_)
return 0;

operation* o = op_queue_.front();
if (o == &task_operation_)
{
op_queue_.pop();
lock.unlock();

{
task_cleanup c = { this, &lock, &this_thread };
(void)c;

// Run the task. May throw an exception. Only block if the operation
// queue is empty and we're not polling, otherwise we want to return
// as soon as possible.
task_->run(false, this_thread.private_op_queue);
boost::this_thread::sleep_for(boost::chrono::seconds(3));
}

o = op_queue_.front();
if (o == &task_operation_)
return 0;
}

...

我的测试驱动程序相当基础:

  • 通过计时器的异步工作循环将打印“.”每 3 秒一次。
  • 产生一个将轮询 io_service 的线程。
  • 延迟以允许新线程有时间轮询 io_service,并在轮询线程在 task_io_service::中休眠时进行主调用 io_service::run() do_poll_one().

测试代码:

#include <iostream>

#include <boost/asio/io_service.hpp>
#include <boost/asio/steady_timer.hpp>
#include <boost/chrono.hpp>
#include <boost/thread.hpp>

boost::asio::io_service io_service;
boost::asio::steady_timer timer(io_service);

void arm_timer()
{
std::cout << ".";
std::cout.flush();
timer.expires_from_now(boost::chrono::seconds(3));
timer.async_wait(boost::bind(&arm_timer));
}

int main()
{
// Add asynchronous work loop.
arm_timer();

// Spawn poll thread.
boost::thread poll_thread(
boost::bind(&boost::asio::io_service::poll, boost::ref(io_service)));

// Give time for poll thread service reactor.
boost::this_thread::sleep_for(boost::chrono::seconds(1));

io_service.run();
}

调试:

[twsansbury@localhost bug]$ gdb a.out ...(gdb) rStarting program: /home/twsansbury/dev/bug/a.out [Thread debugging using libthread_db enabled].[New Thread 0xb7feeb90 (LWP 31892)][Thread 0xb7feeb90 (LWP 31892) exited]

At this point, the arm_timer() has printed "." once (when it was intially armed). The poll thread serviced the reactor in a non-blocking manner, and slept for 3 seconds while op_queue_ was empty (task_operation_ will be added back to the op_queue_ when task_cleanup c exits scope). While the op_queue_ was empty, the main thread calls io_service::run(), sees the op_queue_ is empty, and makes itself the first_idle_thread_, where it waits on its wakeup_event. The poll thread finishes sleeping, and returns 0, leaving the main thread waiting on wakeup_event.

After waiting 10~ seconds, plenty of time for the arm_timer() to be ready, I interrupt the debugger:

Program received signal SIGINT, Interrupt.0x00919402 in __kernel_vsyscall ()(gdb) bt#0  0x00919402 in __kernel_vsyscall ()#1  0x0081bbc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0#2  0x00763b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6#3  0x08059dc2 in void boost::asio::detail::posix_event::wait >(boost::asio::detail::scoped_lock&) ()#4  0x0805a009 in boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock&, boost::asio::detail::task_io_service_thread_info&, boost::system::error_code const&) ()#5  0x0805a11c in boost::asio::detail::task_io_service::run(boost::system::error_code&) ()#6  0x0805a1e2 in boost::asio::io_service::run() ()#7  0x0804db78 in main ()

The side-by-side timeline is as follows:

          poll thread                  |          main thread---------------------------------------+---------------------------------------  lock()                               |   do_poll_one()                        |                            |-- pop task_operation_ from         |  |   queue_op_                        |  |-- unlock()                         |  lock()  |-- create task_cleanup              |  do_run_one()  |-- service reactor (non-block)      |  `-- queue_op_ is empty  |-- ~task_cleanup()                  |      |-- set thread as idle  |   |-- lock()                       |      `-- unlock()  |   `-- queue_op_.push(              |  |       task_operation_)             |  `-- task_operation_ is               |       queue_op_.front()                |      `-- return 0                     |  // still waiting on wakeup_event  unlock()                             |

As best as I could tell, there are no side effects by patching:

if (o == &task_operation_)
return 0;

到:

if (o == &task_operation_)
{
if (!one_thread_)
wake_one_thread_and_unlock(lock);
return 0;
}

无论如何,我已经提交了 bug and fix .考虑密切关注工单以获得官方回复。

关于linux - 即使异步 I/O 操作挂起,也只有处理 io_service 的线程在等待,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15713832/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com