gpt4 book ai didi

python - 使用Beautifulsoup的python多线程

转载 作者:行者123 更新时间:2023-12-03 13:08:36 26 4
gpt4 key购买 nike

这是读取网址链接并将其转换为Beautifulsoup的功能

multithreadding=[]
def scraper_worker(url):
r=requests.get(url)
soup = BeautifulSoup(r.text,"html.parser")
data=soup.find("div",{"class":"main-container"})
multithreadding.append(data)

threadding=[]
for u in split_link:
t=Thread(target=scraper_worker,args=(u, ))
t.start()
threadding.append(t)

split_link是存储50个奇数链接的列表。我在运行多线程部分时遇到问题

最佳答案

这是如何使用queue将结果从线程发送到主线程的示例。

import requests
from bs4 import BeautifulSoup
from threading import Thread
import queue

# --- functions ---

def worker(url, queue): # get queue as argument
r = requests.get(url)

soup = BeautifulSoup(r.text, "html.parser")
data = soup.find("span", {"class": "text"}).get_text()

# send result to main thread using queue
queue.put(data)

# --- main ---

all_links = [
'http://quotes.toscrape.com/page/' + str(i) for i in range(1, 11)
]

all_threads = []
all_results = []
my_queue = queue.Queue()

# run threads
for url in all_links:
t = Thread(target=worker, args=(url, my_queue))
t.start()
all_threads.append(t)

# get results from queue
while len(all_results) < len(all_links):
# get result from queue
data = my_queue.get()
all_results.append(data)

# or with queue.empty if loop has to do something more
# because queue.get() wait for data if queue is empty and blocks loop

#if not my_queue.empty():
# data = my_queue.get()
# all_results.append(data)

# display results
for item in all_results:
print(item[:50], '...')

关于python - 使用Beautifulsoup的python多线程,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48005396/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com