gpt4 book ai didi

python - 报纸.文章.文章异常 : Article `download()` failed with 403 Client Error: Forbidden for url

转载 作者:行者123 更新时间:2023-12-04 09:23:35 25 4
gpt4 key购买 nike

我正在尝试从我可以通过网络(例如 Safari)浏览的文章中下载文本。
错误是:

newspaper.article.ArticleException: Article `download()` failed with 403 Client Error: Forbidden for url: https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830 on URL https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830
这是代码:
from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15'
config = Config()

config.browser_user_agent = user_agent
url = "https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830".strip()



page = Article(url, config=config)


page.download()
page.parse()
print(page.text)
就像你看到的,我在这个 Stackoverflow answer 中尝试了解决方案但没有用。
完整的错误日志:
/Users/mona/anaconda3/bin/python /Users/mona/multimodal/newspaper_pg.py
Traceback (most recent call last):
File "/Users/mona/multimodal/newspaper_pg.py", line 18, in <module>
page.parse()
File "/Users/mona/anaconda3/lib/python3.6/site-packages/newspaper/article.py", line 191, in parse
self.throw_if_not_downloaded_verbose()
File "/Users/mona/anaconda3/lib/python3.6/site-packages/newspaper/article.py", line 532, in throw_if_not_downloaded_verbose
(self.download_exception_msg, self.url))
newspaper.article.ArticleException: Article `download()` failed with 403 Client Error: Forbidden for url: https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830 on URL https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830

Process finished with exit code 1
我从这个网站得到了我的用户代理信息: https://developers.whatismybrowser.com/useragents/explore/operating_system_name/macos/

最佳答案

对我来说正确的用户代理是 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0你可以在这里找到你的:https://www.whatismybrowser.com/detect/what-is-my-user-agent

from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()

config.browser_user_agent = user_agent
url = "https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830".strip()



page = Article(url, config=config)


page.download()
page.parse()
print(page.text)

关于python - 报纸.文章.文章异常 : Article `download()` failed with 403 Client Error: Forbidden for url,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63060350/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com