gpt4 book ai didi

python - 为什么 lxml.html.parse() 末尾的斜杠很重要?

转载 作者:太空宇宙 更新时间:2023-11-03 19:22:17 24 4
gpt4 key购买 nike

我正在使用 lxml 来抓取 html。此代码有效。

lxml.html.parse( "http://google.com/" )

这段代码没有。

lxml.html.parse( "http://google.com" )

为什么 URL 末尾的斜杠很重要?谢谢。

需要明确的是,这是 python 从后面的代码中给我的错误日志。

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/davidfaux/epd-7.2-2-rh5-x86/lib/python2.7/site-packages/lxml/html/__init__.py", line 692, in parse
return etree.parse(filename_or_url, parser, base_url=base_url, **kw)
File "lxml.etree.pyx", line 2953, in lxml.etree.parse (src/lxml/lxml.etree.c:56204)
File "parser.pxi", line 1533, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:82287)
File "parser.pxi", line 1562, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:82580)
File "parser.pxi", line 1462, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:81619)
File "parser.pxi", line 1002, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:78528)
File "parser.pxi", line 569, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:74472)
File "parser.pxi", line 650, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:75363)
File "parser.pxi", line 588, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:74665)
IOError: Error reading file 'http://google.com': failed to load HTTP resource

最佳答案

因为没有斜杠,Google 不会向您发送页面,而是向您发送重定向。事实上,它会将您重定向到带有斜杠的 URL!重定向的正文可能是空的。

关于python - 为什么 lxml.html.parse() 末尾的斜杠很重要?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9303567/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com