gpt4 book ai didi

dns - 为什么在爬虫架构中需要 DNS 解析器?

转载 作者:行者123 更新时间:2023-12-01 11:49:14 28 4
gpt4 key购买 nike

在我读过的每一篇关于爬虫提议的论文中,我看到一个重要的组成部分是 DNS 解析器 .

我的问题是:

为什么有必要? 我们不能直接向 http://www.some-domain.com/ 提出请求吗? ?

最佳答案

DNS resolution is a well-known bottleneck in web crawling. Due to the distributed nature of the Domain Name Service, DNS resolution may entail multiple requests and round-trips across the internet, requiring seconds and sometimes even longer. Right away, this puts in jeopardy our goal of fetching several hundred documents a second.

There is another important difficulty in DNS resolution; the lookup implementations in standard libraries (likely to be used by anyone developing a crawler) are generally synchronous. This means that once a request is made to the Domain Name Service, other crawler threads at that node are blocked until the first request is completed. To circumvent this, most web crawlers implement their own DNS resolver as a component of the crawler.



http://nlp.stanford.edu/IR-book/html/htmledition/dns-resolution-1.html

关于dns - 为什么在爬虫架构中需要 DNS 解析器?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13106550/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com