gpt4 book ai didi

clojure - 确保在所有请求完成后关闭 clj-http 连接管理器的正确方法

转载 作者:行者123 更新时间:2023-12-01 03:24:27 25 4
gpt4 key购买 nike

我有一个代码是 clj-http 的组合, core.async设施和 atom .它创建了一些线程来获取和解析一堆页面:

(defn fetch-page
([url] (fetch-page url nil))
([url conn-manager]
(-> (http.client/get url {:connection-manager conn-manager})
:body hickory/parse hickory/as-hickory)))

(defn- create-worker
[url-chan result conn-manager]
(async/thread
(loop [url (async/<!! url-chan)]
(when url
(swap! result assoc url (fetch-page url conn-manager))
(recur (async/<!! url-chan))))))

(defn fetch-pages
[urls]
(let [url-chan (async/to-chan urls)
pages (atom (reduce (fn [m u] (assoc m u nil)) {} urls))
conn-manager (http.conn-mgr/make-reusable-conn-manager {})
workers (mapv (fn [_] (create-worker url-chan pages conn-manager))
(range n-cpus))]
; wait for workers to finish and shut conn-manager down
(dotimes [_ n-cpus] (async/alts!! workers))
(http.conn-mgr/shutdown-manager conn-manager)

(mapv #(get @pages %) urls)))

这个想法是使用多个线程来减少获取和解析页面的时间,但我想 不是 服务器过载,一次发送大量请求 - 这就是使用连接管理器的原因。不知道我的做法是否正确,欢迎提出建议。目前的问题是最后的请求失败,因为连接管理器在它们终止之前关闭: Exception in thread "async-thread-macro-15" java.lang.IllegalStateException: Connection pool shut down .

主要问题:如何在适当的时候关闭连接管理器(以及为什么我当前的代码无法执行此操作)?支线任务:我的方法对吗?如果没有,我可以做些什么来一次获取和解析多个页面,同时又不会使服务器过载?

谢谢!

最佳答案

问题是async/alts!!返回第一个结果(并且会一直这样做,因为 workers 永远不会改变)。我想使用 async/merge建立一个 channel ,然后反复读取它应该可以工作。

(defn fetch-pages
[urls]
(let [url-chan (async/to-chan urls)
pages (atom (reduce (fn [m u] (assoc m u nil)) {} urls))
conn-manager (http.conn-mgr/make-reusable-conn-manager {})
workers (mapv (fn [_] (create-worker url-chan pages conn-manager))
(range n-cpus))
all-workers (async/merge workers)]
; wait for workers to finish and shut conn-manager down
(dotimes [_ n-cpus] (async/<!! all-workers))
(http.conn-mgr/shutdown-manager conn-manager)

(mapv #(get @pages %) urls)))

或者,您可以重复并不断缩小 workers相反,您只是在等待以前未完成的 worker 。
(defn fetch-pages
[urls]
(let [url-chan (async/to-chan urls)
pages (atom (reduce (fn [m u] (assoc m u nil)) {} urls))
conn-manager (http.conn-mgr/make-reusable-conn-manager {})
workers (mapv (fn [_] (create-worker url-chan pages conn-manager))
(range n-cpus))]
; wait for workers to finish and shut conn-manager down
(loop [workers workers]
(when (seq workers)
(let [[_ finished-worker] (async/alts!! workers)]
(recur (filterv #(not= finished-worker %) workers)))))

(http.conn-mgr/shutdown-manager conn-manager)
(mapv #(get @pages %) urls)))

关于clojure - 确保在所有请求完成后关闭 clj-http 连接管理器的正确方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42677343/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com