gpt4 book ai didi

python - 每次失败后网站下载两次

转载 作者:行者123 更新时间:2023-11-28 18:47:05 25 4
gpt4 key购买 nike

我有一些 Python 代码可以从联合国教科文组织的网站上抓取数据。它运行得很好,但如果在获取任何页面时出现错误,则会再次调用获取数据的函数,并获取页面。不幸的是,页面被抓取了两次,我也不知道为什么。

完整代码可用here .但是导致问题的功能如下:

country_code_list = [["AFG"],["ALA"],["DZA"],["ALB"]]
countries = {"AFG":"Afghanistan","ALA":"Aland Islands","ALB":"Albania","DZA":"Algeria"}
base_url = "http://www.unesco.org/xtrans/bsresult.aspx?lg=0&c="

def get_page(self, url, country, all_books, thread_no, sleep_time=0):
time.sleep(sleep_time)

try:
target_page = urllib2.urlopen(url)
if sleep_time != 0:
print("Thread {0} successfully fetched {1}"\
.format(self.thread_no, url))
except Exception, error:
print("Thread {0} Error getting {1} while processing {2}: ".format\
(thread_no, url, country), error)
self.get_page(url, country, all_books, thread_no, (sleep_time + 1))

page = BeautifulSoup(target_page, parse_only=only_restable)
books = page.find_all('td',class_="res2")
for book in books:
all_books.append(Book (book,country))
page.decompose()

for title in all_books:
title.export(country)

与该函数交互的唯一其他代码是遍历网页的代码,该代码在这里,但我认为这不是问题所在:

    def build_list(self, code_list, countries, thread):
''' Build the list of all the books, and return a list of Book objects
in case you want to do something with them in something else, ever.'''
for country in code_list:

print('Thread {0} now processing {1} \n'.format(self.thread_no, \
countries[country]))
results_total = self.get_total_results(country, base_url)

with open(count_file, "a") as count_table:
print(country + ": " + str(results_total), file=count_table)

for page_num in range(0,results_total,10):
all_books = []
url = base_url + country + "&fr=" + str(page_num)
try:
self.get_page(url, country, all_books, self.thread_no)
except Exception, error:
print("Thread {0} Error getting {1} while processing {2}: "\
.format(self.thread_no, url, country), error)
self.get_page(url, country, all_books, self.thread_no, 1)
print("Thread {0} completed.".format(self.thread_no))

最佳答案

在你的异常代码之后,添加一个return语句:

except Exception, error:
print("Thread {0} Error getting {1} while processing {2}: ".format\
(thread_no, url, country), error)
self.get_page(url, country, all_books, thread_no, (sleep_time + 1))
return

否则,它将继续处理失败的页面。

关于python - 每次失败后网站下载两次,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18391337/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com