gpt4 book ai didi

java - JSoup 从 xml 中剥离 html 标记

转载 作者:太空宇宙 更新时间:2023-11-04 15:09:20 24 4
gpt4 key购买 nike

我一直在寻找 stackoverflow,但找不到遇到此类问题的人。

我想做这样的事情:

输入字符串:

<?xml version="1.0" encoding="UTF-8" ?>
<List>
<Object>
<Section>Fruit</Section>
<Category>Bananas</Category>
<Brand>Chiquita</Brand>
<Obs><p>
Vende-se a pe&ccedil;as ou o conjunto.</p><br>
</Obs>
</Object>
</List>

我想要的是去掉 html 标签,比如 <p>,<br>等等 所以它像这样结束:

<?xml version="1.0" encoding="UTF-8" ?>
<List>
<Object>
<Section>Fruit</Section>
<Category>Bananas</Category>
<Brand>Chiquita</Brand>
<Obs>
Vende-se a pe&ccedil;as ou o conjunto.
</Obs>
</Object>
</List>

我一直在玩弄 JSoup,但我似乎无法让它正常工作。

这是我的代码:

Whitelist whitelist = Whitelist.none();
String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?><List><Object><Section>Fruit</Section><Category>Bananas</Category><Brand>Chiquita</Brand><Obs><p>Vende-se a pe&ccedil;as ou o conjunto.</p><br></Obs></Object></List>";

whitelist.addTags(new String[]{"?xml", "List", "Object", "Section", "Category", "Brand", "Obs"});
String safe = Jsoup.clean(xml, whitelist);

这是我得到的结果:

FruitBananasChiquitaVende-se a pe&ccedil;as ou o conjunto.

提前致谢

最佳答案

标签是小写的,使用:

whitelist.addTags(new String[] { "?xml", "list", "object", "section",
"category", "brand", "obs" });

输出:

<list>
<object>
<section>
Fruit
</section>
<category>
Bananas
</category>
<brand>
Chiquita
</brand>
<obs>
Vende-se a pe&ccedil;as ou o conjunto.
</obs></object>
</list>

关于java - JSoup 从 xml 中剥离 html 标记,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21833738/

24 4 0
文章推荐: selenium-webdriver - 获取 SPAN 的值(value) - Selenium-WebDriver
文章推荐: css - 如何将
    对齐到包含
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com