gpt4 book ai didi

ruby-on-rails - 此 Ruby 代码是否正确使用线程、线程池和并发性

转载 作者:行者123 更新时间:2023-12-03 12:45:48 24 4
gpt4 key购买 nike

我现在考虑的第 3 部分完成了 ping 一个非常大的 URL 列表(数量以千计)并检索与其关联的 URL 的 x509 证书的任务。第 1 部分是 here (How do I properly use threads to ping a URL)第 2 部分是 here (Why won't my connection pool implement my thread code) .

自从我问了这两个问题后,我现在得到了以下代码:

###### This is the code that pings a url and grabs its x509 cert #####

class SslClient
attr_reader :url, :port, :timeout

def initialize(url, port = '443')
@url = url
@port = port
end

def ping_for_certificate_info
context = OpenSSL::SSL::SSLContext.new
tcp_client = TCPSocket.new(url, port)
ssl_client = OpenSSL::SSL::SSLSocket.new tcp_client, context
ssl_client.hostname = url
ssl_client.sync_close = true
ssl_client.connect
certificate = ssl_client.peer_cert
verify_result = ssl_client.verify_result
tcp_client.close
{certificate: certificate, verify_result: verify_result }
rescue => error
{certificate: nil, verify_result: nil }
end
end

上面的代码对于我检索 ssl_client.peer_cert 至关重要.下面我有以下代码,它是为他们的证书制作多个 HTTP ping 到 URL 的代码段:
  pool = Concurrent::CachedThreadPool.new
pool.post do
[LARGE LIST OF URLS TO PING].each do |struct|
ssl_client = SslClient.new(struct.domain.gsub("*.", "www."), struct.scan_port)
cert_info = ssl_client.ping_for_certificate_info
struct.x509_cert = cert_info[:certificate]
struct.verify_result = cert_info[:verify_result]
end
end

pool.shutdown
pool.wait_for_termination

#Do some rails code with the database depending on the results.

到目前为止,当我运行这段代码时,速度慢得令人难以置信。我认为通过创建带有线程的线程池,代码会运行得更快。情况似乎并非如此,我不确定为什么。很多是因为我不知道线程、池、饥饿、锁等的细微差别。但是,在实现上述代码之后,我阅读了更多内容以尝试加快速度,但我再次感到困惑和可以使用一些说明来说明如何使代码更快。

对于初学者,在这篇优秀的文章中 here (ruby-concurrency-parallelism) .我们得到以下定义和概念:

Concurrency vs. Parallelism These terms are used loosely, but they do have distinct meanings.

Concurrency: The art of doing many tasks, one at a time. By switching between them quickly, it may appear to the user as though they happen simultaneously. Parallelism: Doing many tasks at literally the same time. Instead of appearing simultaneous, they are simultaneous. Concurrency is most often used for applications that are IO heavy. For example, a web app may regularly interact with a database or make lots of network requests. By using concurrency, we can keep our application responsive, even while we wait for the database to respond to our query.

This is possible because the Ruby VM allows other threads to run while one is waiting during IO. Even if a program has to make dozens of requests, if we use concurrency, the requests will be made at virtually the same time.

Parallelism, on the other hand, is not currently supported by Ruby.



因此,从这篇文章中,我了解到我想做的事情需要同时完成,因为我在网络上 ping URL 并且 Ruby 目前不支持并行。

接下来是让我感到困惑的地方。从关于 Stack Overflow 的第 1 部分问题中,我在给我的评论中了解到以下内容,我应该执行以下操作:

Use a thread pool; don't just create a thousand concurrent threads. For something like connecting to a URL where there will be a lot of waiting you can oversubscribe the number of threads per CPU core, but not by a huge amount. You'll have to experiment.



另一位用户这样说:

You'd not spawn thousands of threads, use a connection pool (e.g https://github.com/mperham/connection_pool) so you have maximum 20-30 concurrent requests going (this maximum number should be determined by testing at which point network performance drops and you get these timeouts)



所以对于这部分,我转向 concurrent-ruby并实现了 CachedThreadPoolFixedThreadPool有 10 个线程。我选择了 `CachedThreadPool,因为在我看来,所需的线程数将由 Threadpool 为我处理。现在在并发 ruby​​ 的池文档中,我看到了这个:
pool = Concurrent::CachedThreadPool.new
pool.post do
# some parallel work
end

我以为我们刚刚在第一篇文章中建立了 Ruby 不支持并行性,那么线程池是做什么的呢?它是同时工作还是并行工作?到底发生了什么?我是否需要线程池?同样在这个时间点,我认为连接池和线程池是相同的,只是可以互换使用。这两个池有什么区别,我需要哪一个?

在另一篇优秀文章 How to Perform Concurrent HTTP Requests in Ruby and Rails ,本文介绍 Concurrent::Promises类形式并发 ruby​​ 以避免锁定并通过两个 api 调用具有线程安全性。这是下面的代码片段,其中包含以下描述:
def get_all_conversations
groups_thread = Thread.new do
get_groups_list
end

channels_thread = Thread.new do
get_channels_list
end

[groups_thread, channels_thread].map(&:value).flatten
end

Every request is executed it its own thread, which can run in parallel because it is a blocking I/O. But can you see a catch here?



在上面的代码中,我们刚刚提到的并行性在 ruby​​ 中不存在。以下是 Concurrent::Promise 的方法
def get_all_conversations
groups_promise = Concurrent::Promise.execute do
get_groups_list
end

channels_promise = Concurrent::Promise.execute do
get_channels_list
end

[groups_promise, channels_promise].map(&:value!).flatten
end

因此,根据这篇文章,这些请求是“并行”提出的。我们现在还在谈论并发吗?

最后,在这两篇文章中,他们谈到了使用 Futures用于并发 http 请求。我不会详细介绍,但我会在此处粘贴链接。

1. Using Concurrent Ruby in a Ruby on Rails Application
2. Learn Concurrency by Implementing Futures in Ruby

再一次,文章中讨论的内容在我看来就像 Concurrent::Promise功能。我只想指出,这些示例展示了如何将这些概念用于需要组合在一起的两个不同 API 调用。这不是我需要的。我只需要快速进行数千次 API 调用并记录结果。

总之,我只想知道我需要做什么才能使我的代码更快并且线程安全以使其同时运行。为了让代码运行得更快,我究竟缺少什么,因为现在它运行得太慢了,以至于我可能一开始就没有使用线程。

概括

我必须使用线程 ping 数千个 URL 以加快进程。代码很慢,如果我正确使用线程、线程池和并发,我会感到困惑。

最佳答案

让我们看看您描述的问题,并尝试一次解决这些问题:

你有两条代码,SslClient以及使用此 ssl 客户端的脚本。根据我对线程池的理解,你使用线程池的方式需要稍微改变一下。

从:

pool = Concurrent::CachedThreadPool.new
pool.post do
[LARGE LIST OF URLS TO PING].each do |struct|
ssl_client = SslClient.new(struct.domain.gsub("*.", "www."), struct.scan_port)
cert_info = ssl_client.ping_for_certificate_info
struct.x509_cert = cert_info[:certificate]
struct.verify_result = cert_info[:verify_result]
end
end

pool.shutdown
pool.wait_for_termination

到:
pool = Concurrent::FixedThreadPool.new(10) 

[LARGE LIST OF URLS TO PING].each do | struct |
pool.post do
ssl_client = SslClient.new(struct.domain.gsub("*.", "www."), struct.scan_port)
cert_info = ssl_client.ping_for_certificate_info
struct.x509_cert = cert_info[:certificate]
struct.verify_result = cert_info[:verify_result]
end
end

pool.shutdown
pool.wait_form

在初始版本中,只有一个工作单元发布到池中。在第二个版本中,我们将与 LARGE LIST OF URLS TO PING 中的项目一样多的工作单元发布到池中。 .

添加更多关于 Ruby 中并发与并行性的信息,由于 GIL(全局解释器锁),Ruby 确实不支持真正的并行性,但这仅适用于我们实际上在 CPU 上执行任何数量的工作时。在网络请求的情况下,与 IO 绑定(bind)的工作相比,CPU 绑定(bind)的工作持续时间可以忽略不计,这意味着您的用例非常适合使用线程。

此外,通过使用线程池,我们可以最大限度地减少 CPU 产生的线程创建开销。当我们使用线程池时,例如在 Concurrent::FixedThreadPool.new(10) 的情况下,我们实际上是在限制池中可用的线程数,对于未绑定(bind)的线程池,每次当一个单元时都会创建新线程存在工作,但池中的其余 thre 线程正忙。

first article ,需要收集每个 worker 返回的结果,并在出现异常时采取有意义的行动(我是作者)。您应该能够使用该博客中给出的类而无需任何更改。

让我们尝试使用 Concurrent::Future 重写您的代码,因为在您的情况下,我们也需要结果。

thread_pool = Concurrent::FixedThreadPool.new(20)

executors = [LARGE LIST OF URLS TO PING].map do | struct |
Concurrent::Future.execute({ executor: thread_pool }) do
ssl_client = SslClient.new(struct.domain.gsub("*.", "www."), struct.scan_port)
cert_info = ssl_client.ping_for_certificate_info
struct.x509_cert = cert_info[:certificate]
struct.verify_result = cert_info[:verify_result]
struct
end
end

executors.map(&:value)

我希望这有帮助。如有问题,请在评论中提问,我将修改这篇文章以回答这些问题。

关于ruby-on-rails - 此 Ruby 代码是否正确使用线程、线程池和并发性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60217604/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com