gpt4 book ai didi

java - 获取页面源时符号未正确复制

转载 作者:行者123 更新时间:2023-12-01 22:04:28 25 4
gpt4 key购买 nike

我正在尝试使用以下代码获取网页的源代码:

public static String getFile(String sUrl) throws ClientProtocolException, IOException {
DefaultHttpClient httpclient = new DefaultHttpClient();
StringBuilder b = new StringBuilder();

// Prepare a request object
HttpGet httpget = new HttpGet(sUrl);

// Execute the request
HttpResponse response = httpclient.execute(httpget);

// Examine the response status
System.out.println(response.getStatusLine());

//status code should be 200
if (response.getStatusLine().getStatusCode() != 200) {
return null;
}

// Get hold of the response entity
HttpEntity entity = response.getEntity();

// If the response does not enclose an entity, there is no need
// to worry about connection release
if (entity != null) {
InputStream instream = entity.getContent();

try {
BufferedReader reader = new BufferedReader(new InputStreamReader(instream));
// do something useful with the response
String s = reader.readLine();

while (s != null) {
b.append(s);
b.append("\n");
s = reader.readLine();
}

} catch (IOException ex) {
// In case of an IOException the connection will be released
// back to the connection manager automatically
throw ex;

} catch (RuntimeException ex) {
// In case of an unexpected exception you may want to abort
// the HTTP request in order to shut down the underlying
// connection and release it back to the connection manager.
httpget.abort();
throw ex;

} finally {
// Closing the input stream will trigger connection release
instream.close();
}

// When HttpClient instance is no longer needed,
// shut down the connection manager to ensure
// immediate deallocation of all system resources
httpclient.getConnectionManager().shutdown();
}

return b.toString();
}

它工作正常,但某些符号(如  、-、单引号等)无法正确复制。我尝试将页面源代码以 text/html 类型保存到 amazon s3 中,并通过访问 s3 服务器 中保存的页面来显示它。

我上面提到的符号显示为 。有什么解决办法吗?

最佳答案

您需要确保您正在使用页面编码读取内容,否则将使用您的系统默认编码(这显然不是您所看到的正确编码):

 BufferedReader reader = new BufferedReader(
new InputStreamReader(instream, entity.getContentEncoding()));

关于java - 获取页面源时符号未正确复制,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33036729/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com