gpt4 book ai didi

python - 超时时重新加载网页

转载 作者:太空宇宙 更新时间:2023-11-04 05:46:40 25 4
gpt4 key购买 nike

大家好,我的代码主要用于检查我提供的一些链接,以在网页中查找某些标签。一旦找到它就会把我提供的链接还给我。但是,有时 mechanize 会永远卡在尝试打开/阅读页面的过程中,除非我设置了超时。他们有什么方法可以在超时时重新加载/重试网页吗?

import mechanize
from mechanize import Browser
from bs4 import BeautifulSoup
import urllib2
import time
import os
from tqdm import tqdm
import socket


br = Browser()

with open("url.txt", 'r+') as f:
lines = f.read().splitlines()

br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

no_stock = []
for i in tqdm(lines):
r = br.open(i, timeout=200)
r = r.read()
done = False
tries = 3
while tries and not done:
try:
soup = BeautifulSoup(r,'html.parser')
done = True # exit the loop
except:
tries -= 1 # to exit when tries == 0
if not done:
print('Failed for {}'.format(i))
continue # skip this and continue with the next
table = soup.find_all('div', {'class' : "empty_result"})
results = soup.find_all('strong', style = 'color: red;')
if table or results:
no_stock.append(i)

更新错误:

  File "/usr/local/lib/python2.7/dist-packages/mechanize/_response.py", line 190, in read
self.__cache.write(self.wrapped.read())
File "/usr/lib/python2.7/socket.py", line 355, in read
data = self._sock.recv(rbufsize)
File "/usr/lib/python2.7/httplib.py", line 587, in read
return self._read_chunked(amt)
File "/usr/lib/python2.7/httplib.py", line 656, in _read_chunked
value.append(self._safe_read(chunk_left))
File "/usr/lib/python2.7/httplib.py", line 702, in _safe_read
chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/usr/lib/python2.7/socket.py", line 384, in read
data = self._sock.recv(left)
socket.timeout: timed out

感谢任何帮助!

最佳答案

捕获 socket.timeout 异常并在那里重试:

try:
# first try
soup = BeautifulSoup(r,'html.parser')
except socket.timeout:
# try a second time
soup = BeautifulSoup(r,'html.parser')

你甚至可以尝试多次,如果一行失败,继续下一行:

for i in tqdm(lines):
r = br.open(i, timeout=200)
r = r.read()
done = False
tries = 3
while tries and not done:
try:
soup = BeautifulSoup(r,'html.parser')
done = True # exit the loop
except: # just catch any error
tries -= 1 # to exit when tries == 0
if not done:
print('Failed for {}'.format(i))
continue # skip this and continue with the next
table = soup.find_all('div', {'class' : "empty_result"})
results = soup.find_all('strong', style = 'color: red;')
if table or results:
no_stock.append(i)

关于python - 超时时重新加载网页,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31977675/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com