gpt4 book ai didi

ruby-on-rails - ruby 2.1.2 超时仍然不是线程安全的吗?

转载 作者:数据小太阳 更新时间:2023-10-29 07:32:01 25 4
gpt4 key购买 nike

我有 50 个 sidekiq 线程在网络上爬行,几周前线程在运行大约 20 分钟后开始挂起。当我进行回溯转储时,大多数线程都停留在 net/http initialize 上:

/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:879:in `initialize'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:879:in `open'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:879:in `block in connect'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:76:in `timeout'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:878:in `connect'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:863:in `do_start'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:858:in `start'
/app/vendor/bundle/ruby/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:700:in `start'
/app/vendor/bundle/ruby/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:631:in `connection_for'
/app/vendor/bundle/ruby/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:994:in `request'
/app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:257:in `fetch'
/app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:974:in `response_redirect'
/app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:298:in `fetch'
/app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize.rb:432:in `get'
/app/app/workers/crawl_page.rb:24:in `block in perform'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:91:in `block in timeout'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35:in `block in catch'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35:in `catch'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35:in `catch'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:106:in `timeout'

我不认为 sidekiq 会卡在 net/http 上,因为我已经将整个调用包装在超时中:Timeout::timeout(APP_CONFIG['crawl_page_timeout']) { @page = agent.get (网址)

...但后来我开始阅读一些关于 ruby​​ 的超时如何不是线程安全的旧帖子:http://blog.headius.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html

ruby的Timeout还不是线程安全的吗?

我知道很多人用 Ruby 编写爬虫。如果 Timeout 不是线程安全的,人们如何编写爬虫来处理 net/http 卡住的问题?

更新:

我已经切换到 HTTPClient(特别说明它的线程安全)来替换 mechanize。我们似乎仍然卡在初始化线程上。同样,这可能是由于 ruby​​ 的 Timeout 无法正常工作,或者可能是 sidekiq 问题。这是最近挂起的 sidekiq 线程的堆栈跟踪:

/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:805:in `initialize'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:805:in `new'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:805:in `create_socket'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:752:in `block in connect'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:91:in `block in timeout'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:101:in `call'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:101:in `timeout'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:127:in `timeout'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:751:in `connect'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:609:in `query'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:164:in `query'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:1087:in `do_get_block'
/app/vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.2.239/lib/new_relic/agent/instrumentation/httpclient.rb:34:in `block in do_get_block_with_newrelic'
/app/vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.2.239/lib/new_relic/agent/cross_app_tracing.rb:43:in `tl_trace_http_request'
/app/vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.2.239/lib/new_relic/agent/instrumentation/httpclient.rb:33:in `do_get_block_with_newrelic'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:891:in `block in do_request'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:985:in `protect_keep_alive_disconnected'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:890:in `do_request'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:963:in `follow_redirect'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:776:in `request'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:677:in `get'
/app/app/ohm_models/queued_page.rb:20:in `run_crawl'

最佳答案

正确,在 Ruby 代码中使用 Timeout 仍然不安全,除非您确切地知道该 block 中发生了什么(包括任何 C 代码可能是什么)正在做)。因此,我亲眼目睹了连接池中发生的灾难性事件。

可能能够避免错误并重试,但如果您不走运,您的过程可能会卡住并需要重新启动。

如果您创建新进程,如果它们运行时间长,您可以安全地杀死它们(或使用 timeout(1) 因为它们没有任何方法可以破坏您的父进程。

I know a lot of people write crawlers in Ruby. If Timeout isn't thread-safe, how are people writing crawlers handling the issue of net/http getting stuck?

您有具体的行之有效的示例吗?

关于ruby-on-rails - ruby 2.1.2 超时仍然不是线程安全的吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25803089/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com