gpt4 book ai didi

java - .getResponse 代码在有效 URL 上抛出 IOException

转载 作者:可可西里 更新时间:2023-11-01 17:10:05 24 4
gpt4 key购买 nike

我正在构建一个网络爬虫,并且有一种方法可以检查是否存在错误链接。有一次我试图获取 HTTP 响应代码以确定它是否有效。尽管给它一个有效的 URL(在浏览器中打开它就好了)它仍然返回它是无效的。这是代码:

public static boolean isBrokenLink(URL baseURL, String theHREF) {
boolean isBroken = false;
if (baseURL == null) {
try {
baseURL = new URL("HTTP", "cs.uwec.edu/~stevende/cs145testpages/", theHREF);
System.out.println(baseURL);
} catch (MalformedURLException e) {
isBroken = true;
//e.printStackTrace();
}
}
try {
URLConnection con = baseURL.openConnection();
HttpURLConnection httpProtocol = (HttpURLConnection) con;
System.out.println(httpProtocol.getResponseCode());
if (httpProtocol.getResponseCode() != 200 && httpProtocol.getResponseCode() == -1) {
isBroken = true;
}
} catch (IOException e) {
isBroken = true;
e.printStackTrace();
}

return isBroken;
}
}

here是我传递给它的 URL。 isBroken 是返回的 boolean 值。我将 baseURL 作为 null 传递,将 theHREF 作为相对链接 (page2.htm) 传递。我在从字符串创建 URL 后打印出它。谢谢你的帮助!这是错误:

java.net.UnknownHostException: cs.uwec.edu/~stevende/cs145testpages/
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1300)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
at edu.uwec.cs.carpenne.webcrawler.Webcrawler.isBrokenLink(Webcrawler.java:106)
at edu.uwec.cs.carpenne.webcrawler.Webcrawler.main(Webcrawler.java:181)

最佳答案

异常告诉我们,它使用主机名和本地部分作为(未知)主机。这看起来您构建的 URL 不正确。也许您忘记使用 http:// 前缀或使用了错误的 getter?您可以通过调用 baseURL.getHost()baseURL.getPath()baseURL.getProtocol() 来调试它,看看它是否返回 cs.uwec.edu/~steve...http

我刚刚注意到你添加了带有 new URL("HTTP", "cs.uwec.edu/~stevende/cs145testpages/", theHREF) 的 baseURL 这是错误的,你需要使用 new URL("http", "cs.uwec.edu", 80, "/~stevende/cs145testpages/#"+theHREF) .但是,您通常可以跳过 anchor/ref,因为它不会传输到服务器。

您还可以使用单参数构造函数 new URL("http://cs.uwec.edu//~stevende/cs145testpages/")

关于java - .getResponse 代码在有效 URL 上抛出 IOException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23098360/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com