gpt4 book ai didi

Java 从 UTF-8 格式的 URL 读取 XML?

转载 作者:行者123 更新时间:2023-12-02 01:41:24 25 4
gpt4 key购买 nike

我正在尝试从 URL 解析 XML 数据,但我似乎无法让它将其解析为 UTF-8,因为从响应中读取时 ¥ 字符变得困惑:

URL url = new URL("https://suggestqueries.google.com/complete/search?output=toolbar&hl=en&q=¥");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
final InputStream in = url.openStream();
final InputSource source = new InputSource(new InputStreamReader(in, "UTF-8"));
source.setEncoding("UTF-8");
Document doc = db.parse(source);
doc.getDocumentElement().normalize();

NodeList nodeList = doc.getElementsByTagName("suggestion");

for (int i = 0; i < 10; i++) {
Node node = nodeList.item(i);
if(node==null || listItems.size() > 10){
break;
}
String suggestion = node.getAttributes().getNamedItem("data").getTextContent();
// ...suggestions include � instead of ¥
}

source.setEncoding() 是另一个线程中接受的答案,但似乎对我不起作用。

最佳答案

输入文件的编码似乎与 UTF-8 不同。

这些对我有用:

使用ISO-8859-1编码读取文档

Document doc = db.parse(new InputSource(new InputStreamReader(url.openStream(), "ISO-8859-1")));

最终的方法是这样的:

URL url = new URL("https://suggestqueries.google.com/complete/search?output=toolbar&hl=en&q=¥");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new InputStreamReader(url.openStream(), "ISO-8859-1")));
doc.getDocumentElement().normalize();

NodeList nodeList = doc.getElementsByTagName("suggestion");

for (int i = 0; i < 10; i++) {
Node node = nodeList.item(i);
if(node==null){
break;
}
String suggestion = node.getAttributes().getNamedItem("data").getTextContent();
System.out.println(suggestion);
}

关于Java 从 UTF-8 格式的 URL 读取 XML?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54406217/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com