gpt4 book ai didi

java - 使用JSOUP解析html文件并将其映射为JAVA中的键值对

转载 作者:太空宇宙 更新时间:2023-11-04 09:39:18 25 4
gpt4 key购买 nike

我已经使用 Jsoup 解析了 HTML,并且正在尝试从中获取键值对。

这是 HTML 文件,其中包含 dt dlterm 中的关键字和 dd 中的值:

<div class="section" id="GUID-1BF02E47-1ECC-4CCF-A903-2A8621DB5FBA__GUID- 20A253C1-02AD-4413-9570-C0178C01E616">
<div class="p">
<dl class="dl">
<dt class="dt dlterm">
<a name="GUID-1BF02E47-1ECC-4CCF-A903-2A8621DB5FBA__GUID-942CC4F1-90F8- 4B83-9647-A3D086063B0C"><!----></a>Incident</dt>
<dd class="dd">detials of one</dd>
<dt class="dt dlterm"><a name="GUID-1BF02E47-1ECC-4CCF-A903- 2A8621DB5FBA__GUID-0F5CFEC5-6714-4000-A733-79DDB49B4C63"><!----> </a>Risk</dt>
<dd class="dd">details of it two</dd>
<dt class="dt dlterm"><a name="GUID-1BF02E47-1ECC-4CCF-A903- 2A8621DB5FBA__GUID-C731C50A-947F-431B-9CEE-1FFD1BA40EEA"><!----> </a>Event</dt>
<dd class="dd">detials of it three.</dd>
</dl>
</div>
</div>

这是我尝试过的:

static Map<Object, Object> maps;

public static Map<Object, Object> getSet(Document doc) {
maps = new HashMap<Object, Object>();
String key ="";
String value = "";
Elements elemname1 = doc.getElementsByClass("dt dlterm");
Elements elemname2 = doc.getElementsByClass("dd");

List<Object> keys = new ArrayList<Object>();
List<Object> values = new ArrayList<Object>();
for (Element i : elemname1) {
key = i.ownText();
keys.add(key);
}
for(Element j : elemname2) {
value = j.ownText();
values.add(value);
}
System.out.println(maps);
return maps;
}

public static void main (String args[]) throws Exception {
String filePath ="someFilePath.html";
File input = new File(filePath);
Document doc = Jsoup.parse(input, "UTF-8", "");
getSet(doc);
}

预期结果如下:

{ 
Event = detials of one,
Incident = detials of two,
Risk = detials of three
}

我得到的是:

{[Incident, Risk, Event] = [detials of one,detials of two,detials of three]}

最佳答案

你可以使用这个:

Document document = Jsoup.parse(html);

Elements dts = document.getElementsByClass("dt dlterm");
Elements dds = document.getElementsByClass("dd");

if (dts.size() != dds.size()) {
// ensure same sizes of both lists
}

HashMap<String, String> values = new HashMap<>();
for (int i = 0; i < dts.size(); i++) {
values.put(dts.get(i).text(), dds.get(i).text());
}

或者仅在一个使用 Java Streams 的语句中:

Map<String, String> values = IntStream.range(0, Math.min(dts.size(), dds.size())).boxed()
.collect(Collectors.toMap(i -> dts.get(i).text(),i -> dds.get(i).text()));

结果将是这样的:

{Risk=details of it two, Event=detials of it three., Incident=detials of one}

如果您想确保映射中的顺序与 HTML 代码中的顺序相同,请使用 LinkedHashMap 而不是 HashMap

关于java - 使用JSOUP解析html文件并将其映射为JAVA中的键值对,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56153999/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com