gpt4 book ai didi

python - 当我使用 concurrent.futures 时,无法弄清楚如何将结果写回同一个工作表

转载 作者:行者123 更新时间:2023-12-04 03:31:38 26 4
gpt4 key购买 nike

我使用 openpyxl 库从工作表中读取不同的代码名称,然后在 website 中使用这些代码产生结果,并最终将结果写回同一个工作表,紧挨着相关单元格中的代码。

当我在没有实现多处理的情况下运行脚本时,我发现它运行完美。

但是,当我使用这个库 concurrent.futures 时,我无法弄清楚如何将结果写回到工作表中的相关单元格中。

我目前的尝试:

import requests
from openpyxl import load_workbook
import concurrent.futures as futures

wb = load_workbook('Screener.xlsx')
ws = wb['Screener-1']

link = 'https://backend.otcmarkets.com/otcapi/company/profile/full/{}?'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36',
}
params = {
'symbol': ''
}

def get_info(ticker):
target_link = link.format(ticker)
params['symbol'] = ticker
r = requests.get(target_link,params,headers=headers)
try:
address = r.json()['address']
except (AttributeError,KeyError,IndexError):
address = ""
try:
website = r.json()['website']
except (AttributeError,KeyError,IndexError):
website = ""
return address,website

if __name__ == '__main__':
ticker_list = []
for row in range(2, ws.max_row + 1):
if ws.cell(row=row,column=1).value==None:break
ticker = ws.cell(row=row,column=1).value
ticker_list.append(ticker)

with futures.ThreadPoolExecutor(max_workers=6) as executor:
future_to_url = {executor.submit(get_info, ticker): ticker for ticker in ticker_list}
for future in futures.as_completed(future_to_url):
address,website = future.result()[0],future.result()[1]
print(address,website)

# ws.cell(row=row, column=2).value = '{}'.format(address)
# ws.cell(row=row, column=3).value = '{}'.format(website)
# wb.save('Screener.xlsx')

用于测试的代码很少:

tickers = ['URBT','TPRP','CRBO','PVSP','TSPG','VMHG','MRTI','VTMC','TORM','SORT']

How can I write the results back to the same worksheet while doing reverse search using concurrent.futures?

如果您想知道我尝试将数据写入的确切位置,这就是 the worksheet 的方式 看起来像。

最佳答案

由于您已经在使用 openpyxl,我建议您使用 pandas,因为您可能会发现使用工作簿更容易一些。 openpyxlpandas read_excel 提供支持。

假设您有一个文件 Screener.xlsx,其中包含 Symbol 列,如下所示:

enter image description here

您可以抓取丢失的数据并更新工作簿。

方法如下:

import concurrent.futures as futures

import pandas as pd
import requests

link = 'https://backend.otcmarkets.com/otcapi/company/profile/full/{}'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36',
}


def get_info(ticker) -> dict:
r = requests.get(link.format(ticker), headers=headers)
print(f"Fetching data for {ticker}...")
try:
address = r.json()["address1"]
except (AttributeError, KeyError, IndexError):
address = "None"
try:
website = r.json()["website"]
except (AttributeError, KeyError, IndexError):
website = "None"
return {"ticker": ticker, "address": address, "website": website}


if __name__ == "__main__":
df = pd.read_excel("Screener.xlsx")
tickers = df["Symbol"].to_list()
with futures.ThreadPoolExecutor(max_workers=6) as executor:
future_to_url = {
executor.submit(get_info, ticker): ticker for ticker in tickers
}
tickers_scraped = [
future.result() for future in futures.as_completed(future_to_url)
]
sorted_tickers = sorted(
tickers_scraped, key=lambda i: tickers.index(i["ticker"])
)
df.loc[:, ["Address", "Website"]] = [
[i["address"], i["website"]] for i in sorted_tickers
]
df.to_excel("Screener.xlsx", index=False)

得到这个:

enter image description here

编辑:

这是一个 pandas 方法,没有首先对抓取的数据进行排序。

if __name__ == "__main__":
df = pd.read_excel("Screener.xlsx")
tickers = df["Symbol"].to_list()
with futures.ThreadPoolExecutor(max_workers=6) as executor:
future_to_url = {
executor.submit(get_info, ticker): ticker for ticker in tickers
}
tickers_scraped = [
future.result() for future in futures.as_completed(future_to_url)
]
df_scraped = pd.DataFrame(tickers_scraped).set_index("ticker")
df = df.set_index("Symbol")
df[["Address", "Website"]] = df_scraped[["address", "website"]]
df = df.reset_index()
df.to_excel("Screener.xlsx", index=False)

关于python - 当我使用 concurrent.futures 时,无法弄清楚如何将结果写回同一个工作表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66706665/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com