gpt4 book ai didi

Java:尝试使用 HTMLUnit 读取网页时出现 503 错误

转载 作者:行者123 更新时间:2023-11-28 01:09:10 28 4
gpt4 key购买 nike

我一直在测试 HTMLUnit,我想看看我是否可以从某些网站中获得值(value)。

试穿后:https://rsbuddy.com/exchange?id12934 ,但是我似乎遇到了一些 503 错误。

这似乎与 CloudFlare 的 IUAM 有某种冲突。

我环顾四周,发现 this site其中有人和我有同样的问题。社区告诉发帖者 HTMLUnit 可以解决他们的问题,看起来最终确实解决了,但是,没有解决方案。

目前我的代码看起来很简单:

final WebClient webClient = new WebClient();
final HtmlPage page = webClient.getPage("https://rsbuddy.com/exchange?id12934");
System.out.println(page.asXml());

这个输出:

INFO:
<!DOCTYPE HTML>
<html lang="en-US">

<head>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1" />
<title>Just a moment...</title>
<style type="text/css">
html,
body {
width: 100%;
height: 100%;
margin: 0;
padding: 0;
}
body {
background-color: #ffffff;
font-family: Helvetica, Arial, sans-serif;
font-size: 100%;
}
h1 {
font-size: 1.5em;
color: #404040;
text-align: center;
}
p {
font-size: 1em;
color: #404040;
text-align: center;
margin: 10px 0 0 0;
}
#spinner {
margin: 0 auto 30px auto;
display: block;
}
.attribution {
margin-top: 20px;
}
@-webkit-keyframes bubbles {
33%: {
-webkit-transform: translateY(10px);
transform: translateY(10px);
}
66% {
-webkit-transform: translateY(-10px);
transform: translateY(-10px);
}
100% {
-webkit-transform: translateY(0);
transform: translateY(0);
}
}
@keyframes bubbles {
33%: {
-webkit-transform: translateY(10px);
transform: translateY(10px);
}
66% {
-webkit-transform: translateY(-10px);
transform: translateY(-10px);
}
100% {
-webkit-transform: translateY(0);
transform: translateY(0);
}
}
.bubbles {
background-color: #404040;
width: 15px;
height: 15px;
margin: 2px;
border-radius: 100%;
-webkit-animation: bubbles 0.6s 0.07s infinite ease-in-out;
animation: bubbles 0.6s 0.07s infinite ease-in-out;
-webkit-animation-fill-mode: both;
animation-fill-mode: both;
display: inline-block;
}
</style>

<script type="text/javascript">
//<![CDATA[
(function() {
var a = function() {
try {
return !!window.addEventListener
} catch (e) {
return !1
}
},
b = function(b, c) {
a() ? document.addEventListener("DOMContentLoaded", b, c) : document.attachEvent("onreadystatechange", b)
};
b(function() {
var a = document.getElementById('cf-content');
a.style.display = 'block';
setTimeout(function() {
var s, t, o, p, b, r, e, a, k, i, n, g, f, MASOuLk = {
"eMSgRDgS": +((!+[] + !![] + !![] + !![] + []) + (!+[] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + !![]))
};
t = document.createElement('div');
t.innerHTML = "<a href='/'>x</a>";
t = t.firstChild.href;
r = t.match(/https?:\/\//)[0];
t = t.substr(r.length);
t = t.substr(0, t.length - 1);
a = document.getElementById('jschl-answer');
f = document.getElementById('challenge-form');;
MASOuLk.eMSgRDgS -= +((!+[] + !![] + !![] + !![] + !![] + []) + (+[]));
MASOuLk.eMSgRDgS -= +((!+[] + !![] + !![] + !![] + !![] + []) + (+[]));
MASOuLk.eMSgRDgS += +((!+[] + !![] + []) + (!+[] + !![] + !![] + !![] + !![] + !![]));
MASOuLk.eMSgRDgS *= +((!+[] + !![] + !![] + []) + (!+[] + !![] + !![] + !![] + !![] + !![] + !![]));
MASOuLk.eMSgRDgS *= +((+!![] + []) + (!+[] + !![] + !![] + !![]));
MASOuLk.eMSgRDgS *= +((!+[] + !![] + !![] + !![] + []) + (!+[] + !![] + !![] + !![]));
MASOuLk.eMSgRDgS += +((!+[] + !![] + !![] + !![] + []) + (!+[] + !![] + !![] + !![]));
a.value = parseInt(MASOuLk.eMSgRDgS, 10) + t.length;
'; 121'
f.submit();
}, 4000);
}, false);
})();
//]]>
</script>


</head>

<body>
<table width="100%" height="100%" cellpadding="20">
<tr>
<td align="center" valign="middle">
<div class="cf-browser-verification cf-im-under-attack">
<noscript>
<h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>
</noscript>
<div id="cf-content" style="display:none">
<div>
<div class="bubbles"></div>
<div class="bubbles"></div>
<div class="bubbles"></div>
</div>
<h1><span data-translate="checking_browser">Checking your browser before accessing</span> rsbuddy.com.</h1>
<p data-translate="process_is_automatic">This process is automatic. Your browser will redirect to your requested content shortly.</p>
<p data-translate="allow_5_secs">Please allow up to 5 seconds&hellip;</p>
</div>
<form id="challenge-form" action="/cdn-cgi/l/chk_jschl" method="get">
<input type="hidden" name="jschl_vc" value="c4f4252fa3ee7b54a685f74ba192d186" />
<input type="hidden" name="pass" value="1468717381.249-GOgXzrnovV" />
<input type="hidden" id="jschl-answer" name="jschl_answer" />
</form>
</div>


<div class="attribution">
<a href="https://www.cloudflare.com/5xx-error-landing?utm_source=iuam" target="_blank" style="font-size: 12px;">DDoS protection by CloudFlare</a>
<br>Ray ID: 2c39c577c5bb41cf
</div>
</td>
</tr>
</table>
</body>

</html>

Exception in thread "main" com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: 503 Service Temporarily Unavailable for https://rsbuddy.com/exchange?id12934 at com.gargoylesoftware.htmlunit.WebClient.throwFailingHttpStatusCodeExceptionIfNecessary(WebClient.java:570)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:395) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:303) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:450) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:435)
at TestMain.main(TestMain.java:20) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

有没有一种使用 HTMLUnit 的方法可以允许它连接到站点?

最佳答案

检查浏览器版本需要一段时间,我相信如果你:

WebClient webClient = new WebClient(BrowserVersion.CHROME);

首先设置您的浏览器版本。然后运行获取页面的行:

final HtmlPage page = webClient.getPage("https://rsbuddy.com/exchange?id12934");

后跟几个选项:

我。设置等待时间:

webClient.waitForBackgroundJavaScript(5000);

while(page.asText().contains("Checking your browser before accessing")){
webClient.waitForBackgroundJavaScript(100);
}

二。使用 Thread.sleep() 而不是等待 JS:

Thread.sleep(2000);// replace with this code.

最后打印出来:

System.out.println(page.asXml());

关于Java:尝试使用 HTMLUnit 读取网页时出现 503 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38417083/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com