gpt4 book ai didi

python-3.x - urlopen 错误 [Errno 11001] getaddrinfo 失败?

转载 作者:行者123 更新时间:2023-12-02 01:32:28 24 4
gpt4 key购买 nike

大家好,我是 Python 语言的初学者程序员,我需要帮助。

这是我的Python代码,它给出了一个错误,请帮助修复

urllib.error.URLError: urlopen error [Errno 11001] getaddrinfo failed

Python:

# -*- coding: utf-8 -*-

import urllib.request
from lxml.html import parse

WEBSITE = 'http://allrecipes.com'

URL_PAGE = 'http://allrecipes.com/recipes/110/appetizers-and-snacks/deviled-eggs/?page='

START_PAGE = 1
END_PAGE = 5

def correct_str(s):
return s.encode('utf-8').decode('ascii', 'ignore').strip()

for i in range(START_PAGE, END_PAGE+1):
URL = URL_PAGE + str(i)
HTML = urllib.request.urlopen(URL)

page = parse(HTML).getroot()

for elem in page.xpath('//*[@id="grid"]/article[not(contains(@class, "video-card"))]/a[1]'):
href = WEBSITE + elem.get('href')
title = correct_str(elem.find('h3').text)

recipe_page = parse(urllib.request.urlopen(href)).getroot()
print(correct_str(href))
photo_url = recipe_page.xpath('//img[@class="rec-photo"]')[0].get('src')

print('\nName: |', title)
print('Photo: |', photo_url)

进入命令提示符:python 我收到此错误:

Traceback (most recent call last):
http://allrecipes.com/recipe/236225/crab-stuffed-deviled-eggs/
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1240, in do_open

h.request(req.get_method(), req.selector, req.data, headers)
Name: | Crab-Stuffed Deviled Eggs
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1083, in request
Photo: | http://images.media-allrecipes.com/userphotos/720x405/1091564.jpg
self._send_request(method, url, body, headers)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1128, in _send_request
self.endheaders(body)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1079, in endheaders
self._send_output(message_body)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 911, in _send_output
self.send(msg)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 854, in send
self.connect()
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 826, in connect
(self.host,self.port), self.timeout, self.source_address)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 693, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 732, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/Ivan/Dropbox/parser/test.py", line 27, in <module>
recipe_page = parse(urllib.request.urlopen(href)).getroot()
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 162, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 465, in open
response = self._open(req, data)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 483, in _open
'_open', req)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 443, in _call_chain
result = func(*args)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1268, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1242, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>

Process finished with exit code 1

最佳答案

我将尝试解释深入研究编程问题的三种主要方法:

(1) 使用调试器。您可以在使用变量之前和抛出异常之前遍历代码并检查变量。 Python 附带了pdb。在此问题中,您将单步执行代码并在 urlopen() 之前打印出 href

(2) 断言。使用 Python 的 assert 在代码中断言假设。例如,您可以assert not href.startswith('http')

(3) 日志记录。在使用相关变量之前记录它们。这是我使用的:

我将以下内容添加到您的代码中...

href = WEBSITE + elem.get('href')                                       
print(href)

并且得到了...

Photo: | http://images.media-allrecipes.com/userphotos/720x405/1091564.jpg
http://allrecipes.comhttp://dish.allrecipes.com/how-to-boil-an-egg/

从这里您可以看到 getaddrinfo 问题:您的系统正在尝试打开名为 allrecipes.comhttp 的主机上的 URL。

这看起来是一个问题,因为您假设 WEBSITE 必须添加到您从 html 中提取的每个 href 之前。

您可以使用类似这样的内容和 a function to determine if the url is absolute 来处理绝对与相对 href 的情况。 :

import urlparse
def is_absolute(url):
# See https://stackoverflow.com/questions/8357098/how-can-i-check-if-a-url-is-absolute-using-python
return bool(urlparse.urlparse(url).netloc)

href = elem.get('href')
if not is_absolute(href):
href = WEBSITE + href

关于python-3.x - urlopen 错误 [Errno 11001] getaddrinfo 失败?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37667206/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com