gpt4 book ai didi

python - BeautifulSoup 无法加载网页时如何处理

转载 作者:太空宇宙 更新时间:2023-11-04 01:38:03 24 4
gpt4 key购买 nike

目前,如果检索网页时出现错误,soup 将不会填充该页面,而是从 beautifulsoup 获取默认返回值。

我正在寻找一种方法来检查这一点,以便如果获取网页时出现错误,我可以跳过一大块代码,如

if soup:
do stuff

但我不想一起终止。新手查询的应用程序。

def getwebpage(address):
try:
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent' : user_agent }
req = urllib2.Request(address, None, headers)
web_handle = urllib2.urlopen(req)
except urllib2.HTTPError, e:
error_desc = BaseHTTPServer.BaseHTTPRequestHandler.responses[e.code][0]
appendlog('HTTP Error: ' + str(e.code) + ': ' + address)
return
except urllib2.URLError, e:
appendlog('URL Error: ' + e.reason[1] + ': ' + address)
return
except:
appendlog('Unknown Error: ' + address)
return
return web_handle


def test():
soup = BeautifulSoup(getwebpage('http://doesnotexistblah.com/'))
print soup

if soup:
do stuff

test()

最佳答案

构建代码,使一个函数封装从 url 检索数据的整个过程,另一个函数封装该数据的处理:

import urllib2, httplib
from BeautifulSoup import BeautifulSoup

def append_log(message):
print message

def get_web_page(address):
try:
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent' : user_agent }
request = urllib2.Request(address, None, headers)
response = urllib2.urlopen(request, timeout=20)
try:
return response.read()
finally:
response.close()
except urllib2.HTTPError as e:
error_desc = httplib.responses.get(e.code, '')
append_log('HTTP Error: ' + str(e.code) + ': ' +
error_desc + ': ' + address)
except urllib2.URLError as e:
append_log('URL Error: ' + e.reason[1] + ': ' + address)
except Exception as e:
append_log('Unknown Error: ' + str(e) + address)

def process_web_page(data):
if data is not None:
print BeautifulSoup(data)
else:
pass # do something else

data = get_web_page('http://doesnotexistblah.com/')
process_web_page(data)

data = get_web_page('http://docs.python.org/copyright.html')
process_web_page(data)

关于python - BeautifulSoup 无法加载网页时如何处理,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7922362/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com