gpt4 book ai didi

python网页抓取请求错误(mod security)

转载 作者:行者123 更新时间:2023-12-04 14:32:49 25 4
gpt4 key购买 nike

我是新手,我尝试为教程抓取网页的源代码。我安装了 beautifulsoup,请求安装。起初我想捕获源头。我正在从“https://pythonhow.com/example.html”做这个抓取工作。我没有做任何违法的事情,我认为这个网站也是为此目的而建立的。这是我的代码:

import requests
from bs4 import BeautifulSoup

r=requests.get("http://pythonhow.com/example.html")
c=r.content
c

我收到了 mod 安全错误:
b'<head><title>Not Acceptable!</title></head><body><h1>Not Acceptable!</h1><p>An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.</p></body></html>'

感谢所有与我打交道的人。 尊重

最佳答案

您可以通过为请求提供用户代理来轻松解决此问题。通过这样做,该网站会认为有人实际上是在使用 Web 浏览器访问该网站。

这是您要使用的代码:

import requests
from bs4 import BeautifulSoup

headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:55.0) Gecko/20100101 Firefox/55.0',
}

r = requests.get("http://pythonhow.com/example.html", headers=headers)
c = r.content

print(c)

这为您提供了预期的输出

b'<!DOCTYPE html>\n<html>\n<head>\n<style>\ndiv.cities {\n    background-color:black;\n    color:white;\n    margin:20px;\n    padding:20px;\n} \n</style>\n</head>\n<body>\n<h1 align="center"> Here are three big cities </h1>\n<div class="cities">\n<h2>London</h2>\n<p>London is the capital of England and it\'s been a British settlement since 2000 years ago. </p>\n</div>\n<div class="cities">\n<h2>Paris</h2>\n<p>Paris is the capital city of France. It was declared capital since 508.</p>\n</div>\n<div class="cities">\n<h2>Tokyo</h2>\n<p>Tokyo is the capital of Japan and one of the most populated cities in the world.</p>\n</div>\n</body>\n</html>'

关于python网页抓取请求错误(mod security),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61968521/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com