Webcrawler for Testing und learn(用于测试和学习的网络爬虫)-6ren

Webcrawler for Testing und learn(用于测试和学习的网络爬虫)

转载作者：bug小助手更新时间：2023-10-24 23:51:27

24

4

Hi I wanted to try to program a crawler.

嗨，我想试着编程一个爬行器。

I started with a very simple code but already when I execute it I get an error message.

我从一个非常简单的代码开始，但当我执行它时，我已经收到了一条错误消息。

What is wrong with the code?

代码出了什么问题？

I geht this Error at the source point.

我认为这个错误是从源头开始的。

Exception has occurred: ConnectTimeout
HTTPSConnectionPool(host='www.anisearch.de', port=443): Max retries exceeded with url: /anime/2788,naruto (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002757873E090>, 'Connection to www.anisearch.de timed out. (connect timeout=None)'))
TimeoutError: [WinError 10060] Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat

The above exception was the direct cause of the following exception:

urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x000002757873E090>, 'Connection to www.anisearch.de timed out. (connect timeout=None)')

The above exception was the direct cause of the following exception:

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.anisearch.de', port=443): Max retries exceeded with url: /anime/2788,naruto (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002757873E090>, 'Connection to www.anisearch.de timed out. (connect timeout=None)'))

During handling of the above exception, another exception occurred:

  File "C:\Users\admin\Documents\Crawler\anisearch_crawler.py", line 5, in <module>
    source = requests.get(url)
             ^^^^^^^^^^^^^^^^^
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='www.anisearch.de', port=443): Max retries exceeded with url: /anime/2788,naruto (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002757873E090>, 'Connection to www.anisearch.de timed out. (connect timeout=None)'))

It is clear to me that he cannot reach the page, but I do not understand why.
Here my Easy Code

我很清楚，他拿不到那一页，但我不明白为什么。以下是我的简单代码

`from bs4 import BeautifulSoup
import requests

url = "https://www.anisearch.de/anime/2788,naruto"
source = requests.get(url)
soup = BeautifulSoup(source.content, 'html.parser')

def info_anime(soup):
    # Extracting the name of the anime from the <meta> tag
    anime_name = soup.find('meta', {'name': 'title'})['content']
    print("Anime : " + anime_name)


info_anime(soup)`

debug with vs code but i am not really unterstood in what case are the problem. It say "can not request" but why and how i solfe this problem that i can geht this information back.

调试与VS代码，但我并不是真正了解在什么情况下是问题。上面写着“不能请求”，但我为什么以及如何解决这个问题，我才能拿回这个信息。

The site has no api so i try with this methode

该站点没有API，所以我尝试使用此方法

更多回答

either the site is/was down or they have blocked your IP address when they detected you tried to scrape the site.

当他们检测到您试图抓取站点时，可能是站点已关闭，或者他们已经阻止了您的IP地址。

What is the solution against theme? I have only made a request nothing more.

针对主题的解决方案是什么？我只是提出了一个要求，仅此而已。

优秀答案推荐

you could try using HTMLSession instead of requests - that worked for me.

您可以尝试使用HTMLSession而不是请求--这对我很有效。

from bs4 import BeautifulSoup 
from requests_html import HTMLSession

url = "https://www.anisearch.de/anime/2788,naruto"
source = HTMLSession().get(url)
# source.raise_for_status() ## good habit in general

soup = BeautifulSoup(source.content, 'html.parser')

更多回答

Thx for the help but it still not work for me maybe to try with a user Agent like this? headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36' } source = HTMLSession().get(url, headers=headers)

谢谢帮助，但它仍然不工作，我可能会尝试这样的用户代理？Headers={‘用户代理’：‘Mozilla/5.0(Windows NT10.0；Win64；x64)AppleWebKit/537.36(khtml，如壁虎)Chrome/58.0.3029.110 Safari/537.36’}来源=HTMLSession().get(url，Headers=Headers)

@D1skanime does it work for you then? I usually use HTMLSession bc I'm not good with headers, but it doesn't work 100%

@D1skAnime对你有效吗？我通常使用HTMLSession BC我不擅长使用标题，但它不能100%工作

No it doesn't work for me, that's why I ask if it works for you with header. Are there other possibilities? Must possibly try with a porxy that I start directly from python? Or what other possibilities are there?

不，它对我不起作用，这就是为什么我用Header问你它是否起作用。还有其他的可能性吗？一定要尝试一下我直接从Python开始的Porxy吗？或者还有什么其他的可能性？

24

4

0

c# - 学习 C# 有助于或阻碍 VB.NET 学习
关闭。这个问题是opinion-based .它目前不接受答案。想要改进这个问题？更新问题，以便 editing this post 可以用事实和引用来回答它. 关闭 9 年前。 Improve
学习.NET8MiniApis入门
介绍篇什么是MiniApis？ MiniApis的特点和优势 MiniApis的应用场景环境搭建系统要求安装MiniApis 配置开发环境基础概念 MiniApis架构概述
Javascript(学习)
我正在从“JavaScript 圣经”一书中学习 javascript，但我遇到了一些困难。我试图理解这段代码: function checkIt(evt) { evt = (evt) ? e
String.intern() 学习
package com.fastone.www.javademo.stringintern; /** * * String.intern()是一个Native方法， * 它的作用是：如果字
macos - 学习 AppleScript
您会推荐哪些资源来学习 AppleScript。我使用具有 Objective-C 背景的传统 C/C++。我也在寻找有关如何更好地开发和从脚本编辑器获取更快文档的技巧。示例提示是“查找要编写脚本的
java - 学习 OpenCMS
关闭。这个问题不满足Stack Overflow guidelines .它目前不接受答案。想改善这个问题吗？更新问题，使其成为 on-topic对于堆栈溢出。 4年前关闭。 Improve thi
extjs - 学习 ExtJS4
关闭。这个问题不满足Stack Overflow guidelines .它目前不接受答案。想改善这个问题吗？更新问题，使其成为 on-topic对于堆栈溢出。 7年前关闭。 Improve thi
f# - 学习 F#
关闭。这个问题不符合 Stack Overflow guidelines 。它目前不接受答案。想改善这个问题吗？更新问题，以便堆栈溢出为 on-topic。 6年前关闭。 Improve this
flutter - 学习 flutter
我是塞内加尔的阿里。我今年60岁(也许这是我真正的问题-笑脸!!!)。我正在学习Flutter和Dart。今天，我想使用给定数据模型的列表(它的名称是Mortalite，请参见下面的代码)。我尝试
powershell - 学习……真的什么都行
关闭。这个问题是off-topic .它目前不接受答案。想改进这个问题？ Update the question所以它是on-topic对于堆栈溢出。 9年前关闭。 Improve this que
cappuccino - 学习 Cappuccino
学习 Cappuccino 的最佳来源是什么？我从事“传统”网络开发，但我对这个新框架非常感兴趣。请注意，我对 Objective-C 毫无了解。最佳答案如上所述，该网站是一个好地方，但还有一些其
java - 学习 HashMap
我正在学习如何使用 hashMap，有人可以检查我编写的这段代码并告诉我它是否正确吗？这个想法是有一个在公司工作的员工列表，我想从 hashMap 添加和删除员工。 public class Staf
jQuery CoffeeScript - 学习
我正在尝试将 jQuery 与 CoffeScript 一起使用。我按照博客中的说明操作，指示使用 $ -> 或 jQuery -> 而不是 .ready() 。我玩了一下代码，但我似乎无法理解我出错
javascript - PHP传递参数给新的字符串(学习)
还在学习，还有很多问题，所以这里有一些。我正在进行 javascript -> PHP 转换，并希望确保这些做法是正确的。是$dailyparams->$calories = $calories;一条
MySQL 使用临时表(学习)
我目前正在学习 SQL，以便从我们的 Magento 数据库制作一个简单的 RFM 报告，我目前可以通过导出两个查询并将它们粘贴到 Excel 模板中来完成此操作，我想摆脱 Excel 模板。我认为
Javascript > PHP (学习)
我知道我很可能会因为这个问题而受到抨击，但没有人问，我求助于你。这是否是一个正确的 javascript > php 转换 - 在我开始不良做法之前，我想知道这是否是解决此问题的正确方法。 JavaS
ruby - 学习/平铺的资源
除了 Ruby-Doc 之外，哪些来源最适合获取一些示例和教程，尤其是关于 Ruby 中的 Tk/Tile？我发现自己更正常了 http://www.tutorialspoint.com/ruby/r
Python 学习。为什么我只在第一次收到警告？
我只在第一次收到警告。这正常吗？ >>> cv=LassoCV(cv=10).fit(x,y) C:\Python27\lib\site-packages\scikit_learn-0.14.1-py
java - 学习/复习Java
按照目前的情况，这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持，但这个问题可能会引发辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开，visit the
c# - 学习.NET
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be

首页

博学

6Ren·AI

商城

Webcrawler for Testing und learn(用于测试和学习的网络爬虫)