gpt4 book ai didi

python - 谷歌趋势爬虫: CSV writing issues

转载 作者:行者123 更新时间:2023-12-01 02:23:53 24 4
gpt4 key购买 nike

下面的代码是 Google Trend Crawler,它使用来自“https://github.com/GeneralMills/pytrends”的非官方 API。我的代码运行良好,但有一个问题是没有人知道 Google Trend Crawler 的限制。因此,如果我使用包含 2000 个或更多“DNA”的列表运行爬网程序,则会出现错误,提示我已超出请求限制。如果我超出了限制,则限制之前的所有爬网数据都将丢失,因为我在代码末尾写入 csv。有没有办法将每个循环的数据写入 csv,这样即使我超过了限制,至少我在达到限制之前拥有了数据?谢谢

from pytrends.request import TrendReq
from datetime import datetime
import pandas as pd
import time
import xlsxwriter

pytrends = TrendReq(hl='en-US,tz=360')
Data = pd.DataFrame()

#for loop check writer path
path = "C:/Users/aijhshin/Workk/GoogleTrendCounter.txt"
#file = open(path,"a")

#setting index using 'apple' keyword
kw_list = ['apple']
pytrends.build_payload(kw_list, cat=0, timeframe='today 5-y', geo='', gprop='')
Googledate = pd.DataFrame(pytrends.interest_over_time())
Data['Date'] = Googledate.index

#Google Trend Crawler limit = 1600 request per day
for i in range(len(DNA)):
kw_list = [DNA[i]]
pytrends.build_payload(kw_list, cat=0, timeframe='today 5-y', geo='', gprop='')

#results
df = pd.DataFrame(pytrends.interest_over_time())
if(df.empty == True):
Data[DNA[i]] = ""
else:
df.index.name = 'Date'
df.reset_index(inplace=True)
Data[DNA[i]] = df.loc[:, DNA[i]]

#test for loop process
file = open(path,"a")
file.write(str(i) + " " + str(datetime.now()) + " ")
file.write(DNA[i] +'\n')
file.close()

#run one per nine second (optional)
#time.sleep(9)

#writing csv file (overwrite each time)
Data.to_csv('Google Trend.csv')

print("Crawling Done")

最佳答案

Data.to_csv('Google Trend.csv') 移至 time.sleep(9) 后,并将其模式更改为 a

time.sleep(9)
Data.to_csv('Google Trend.csv', mode='a')

模式a将附加到csv文件的末尾,而不是覆盖它。

关于python - 谷歌趋势爬虫: CSV writing issues,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47625436/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com