gpt4 book ai didi

java - 获取网页源代码失败

转载 作者:太空宇宙 更新时间:2023-11-04 08:08:15 25 4
gpt4 key购买 nike

我正在尝试使用 Java 从该站点获取 HTML 页面源内容:“http://207.200.96.231:8008 ”。然而,Java 的默认库在这方面对我没有帮助。我也尝试使用这个tutorial ,但也没有成功。我认为出现这个问题是因为网站的安全保护。当我运行下面提供的以下代码时,出现异常:java.io.IOException:无效的 Http 响应

关于如何实现代码有什么想法吗?或者有什么图书馆可以满足我的需求?到目前为止,我已经尝试过 JSoupJericho HTML Parser,认为它们会使用不同的方法连接到我提供的网站,但它们也无法工作。

String urlstr = "http://72.26.204.28:9484/played.html";

try {

URL url = new URL(urlstr);

URLConnection urlc = url.openConnection();

InputStream stream = urlc.getInputStream();
BufferedInputStream buf = new BufferedInputStream(stream);

StringBuilder sb = new StringBuilder();

while ( true){

int data = buf.read();

if ( data == -1)
break;
else
sb.append((char)data);
}

} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}

编辑(问题已解决):Karai17 的帮助下和 trashgod我设法解决了这个问题。 Shoutcast 页面需要用户代理才能访问其内容。所以我们需要做的就是添加以下代码:

urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0");

最新的代码如下所示:

try {
URL url = new URL("http://207.200.96.231:8008/7.html");
HttpURLConnection urlConnection = (HttpURLConnection)url.openConnection();
urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0");

InputStream is = urlConnection.getInputStream();
BufferedInputStream in = new BufferedInputStream(is);
int c;
while ((c = in.read()) != -1) {
System.out.write(c);
}
urlConnection.disconnect();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}

最佳答案

此流似乎需要 Winamp .

$ curl -v http://207.200.96.231:8008* About to connect() to 207.200.96.231 port 8008 (#0)*   Trying 207.200.96.231... connected* Connected to 207.200.96.231 (207.200.96.231) port 8008 (#0)It appears to require [Winamp][2].> GET / HTTP/1.1> User-Agent: curl/...> Host: 207.200.96.231:8008> Accept: */*> ICY 200 OKicy-notice1:
This stream requires Winamp
icy-notice2:SHOUTcast Distributed Network Audio Server/Linux v1.9.93atdn
icy-name:Absolutely Smooth Jazz - SKY.FM - the world's smoothest jazz 24 hours a dayicy-genre:Soft Smooth Jazzicy-url:http://www.sky.fm/smoothjazz/content-type:audio/mpegicy-pub:1icy-br:96...

Addendum: You can read the stream like this:

URL url = new URL("http://207.200.96.231:8008");
URLConnection con = url.openConnection();
InputStream is = con.getInputStream();
BufferedInputStream in = new BufferedInputStream(is);
int c;
while ((c = in.read()) != -1) {
System.out.write(c);
}

关于java - 获取网页源代码失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11714353/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com