gpt4 book ai didi

java - 将网站内容读入字符串

转载 作者:太空狗 更新时间:2023-10-29 22:51:36 26 4
gpt4 key购买 nike

目前我正在开发一个类,该类可用于读取由 url 指定的网站的内容。我刚刚开始使用 java.iojava.net,所以我需要引用我的设计。

用法:

TextURL url = new TextURL(urlString);
String contents = url.read();

我的代码:

package pl.maciejziarko.util;

import java.io.*;
import java.net.*;

public final class TextURL
{
private static final int BUFFER_SIZE = 1024 * 10;
private static final int ZERO = 0;
private final byte[] dataBuffer = new byte[BUFFER_SIZE];
private final URL urlObject;

public TextURL(String urlString) throws MalformedURLException
{
this.urlObject = new URL(urlString);
}

public String read()
{
final StringBuilder sb = new StringBuilder();

try
{
final BufferedInputStream in =
new BufferedInputStream(urlObject.openStream());

int bytesRead = ZERO;

while ((bytesRead = in.read(dataBuffer, ZERO, BUFFER_SIZE)) >= ZERO)
{
sb.append(new String(dataBuffer, ZERO, bytesRead));
}
}
catch (UnknownHostException e)
{
return null;
}
catch (IOException e)
{
return null;
}

return sb.toString();
}

//Usage:
public static void main(String[] args)
{
try
{
TextURL url = new TextURL("http://www.flickr.com/explore/interesting/7days/");
String contents = url.read();

if (contents != null)
System.out.println(contents);
else
System.out.println("ERROR!");
}
catch (MalformedURLException e)
{
System.out.println("Check you the url!");
}
}
}

我的问题是:这是实现我想要的东西的好方法吗?有没有更好的解决方案?

我特别不喜欢 sb.append(new String(dataBuffer, ZERO, bytesRead)); 但我无法用不同的方式表达它。每次迭代都创建一个新的 String 好吗?我想不会。

还有其他弱点吗?

提前致谢!

最佳答案

考虑使用 URLConnection反而。此外,您可能想利用 IOUtils来自 Apache Commons IO也使字符串阅读更容易。例如:

URL url = new URL("http://www.example.com/");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding(); // ** WRONG: should use "con.getContentType()" instead but it returns something like "text/html; charset=UTF-8" so this value must be parsed to extract the actual encoding
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.println(body);

如果您不想使用 IOUtils,我可能会重写上面的那一行:

ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[8192];
int len = 0;
while ((len = in.read(buf)) != -1) {
baos.write(buf, 0, len);
}
String body = new String(baos.toByteArray(), encoding);

关于java - 将网站内容读入字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5867975/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com