gpt4 book ai didi

javascript - 使用 IP 地址而不是域名来抓取网络服务器的屏幕

转载 作者:行者123 更新时间:2023-11-30 12:26:16 26 4
gpt4 key购买 nike

这可能吗?它在 baseUrl = "http://mashable.com "时有效,但在我给它一个 IP 地址时无效。

<script src='https://raw.github.com/padolsey/jQuery-Plugins/master/cross-domain-ajax/jquery.xdomainajax.js'></script>
<script>$(document).ready(function () {

baseUrl = "https://12.34.56.78:8000/";
$.ajax({
url: baseUrl,
type: "get",
dataType: "",
success: function (data) {
alert("Yeah we are om jere");
});
});

最佳答案

这会很困难,因为许多网站可能托管在同一台服务器上,因此共享同一 IP。它适用于域名,因为您的客户端将它与 GET 请求一起发送到 Host header 中。

查看 Stack Overflow 的 curl 输出:

C:\Users\Yeah>curl --head -i -v stackoverflow.com/
* Hostname was NOT found in DNS cache
* Trying 198.252.206.140...
* Connected to stackoverflow.com (198.252.206.140) port 80 (#0)
> HEAD / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: stackoverflow.com
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< [...]

您可以看到域名作为 header 传递。相反,如果我尝试使用上面找到的 IP 地址进行查询,则会导致 404 错误:

C:\Users\Yeah>curl --head -i -v 198.252.206.140/
* Hostname was NOT found in DNS cache
* Trying 198.252.206.140...
* Connected to 198.252.206.140 (198.252.206.140) port 80 (#0)
> HEAD / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: 198.252.206.140
> Accept: */*
>
< HTTP/1.1 404 Not Found
HTTP/1.1 404 Not Found
< [...]

不过,作为反例,如果我尝试对 Facebook 网站做类似的事情,我会得到以下结果:

C:\Users\Yeah>curl --head -i -v --insecure -L https://www.facebook.com/
* Hostname was NOT found in DNS cache
* Trying 31.13.93.3...
* Connected to www.facebook.com (31.13.93.3) port 443 (#0)
* [SSL stuff ...]
> HEAD / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: www.facebook.com
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< [...]

然后如果我尝试使用上面的 IP 地址:

C:\Users\Yeah>curl --head -i -v --insecure -L https://31.13.93.3/
* Hostname was NOT found in DNS cache
* Trying 31.13.93.3...
* Connected to 31.13.93.3 (31.13.93.3) port 443 (#0)
* [SSL stuff ...]
> HEAD / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: 31.13.93.3
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
HTTP/1.1 301 Moved Permanently
< Location: http://www.facebook.com/
Location: http://www.facebook.com/
< [...]

<
* Connection #0 to host 31.13.93.3 left intact
* Issue another request to this URL: 'http://www.facebook.com/'
* Hostname was NOT found in DNS cache
* Trying 31.13.93.3...
* Connected to www.facebook.com (31.13.93.3) port 80 (#1)
> HEAD / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: www.facebook.com
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
HTTP/1.1 301 Moved Permanently
< [...]

<
* Connection #1 to host www.facebook.com left intact
* Issue another request to this URL: 'https://www.facebook.com/'
* Found bundle for host www.facebook.com: 0x6097814fe0
* Hostname was NOT found in DNS cache
* Trying 31.13.93.3...
* Connected to www.facebook.com (31.13.93.3) port 443 (#2)
* [SSL stuff ...]
> HEAD / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: www.facebook.com
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< [...]

这里需要 -L(跟随重定向)和 --insecure(接受任何证书)来使 cUrl 最终连接到 Facebook 网站,但这些是通常的客户端(即浏览器)操作。

所以这实际上取决于您要筛选废料的特定网站和服务器配置。

关于javascript - 使用 IP 地址而不是域名来抓取网络服务器的屏幕,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29266787/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com