gpt4 book ai didi

java - 如何删除字符串中的html标签?

转载 作者:行者123 更新时间:2023-12-02 08:28:32 25 4
gpt4 key购买 nike

当我搜索关键字“数据”时,我在数字图书馆中获得了论文摘要:

Many organizations often underutilize their existing <span class='snippet'>data</span> warehouses. In this paper, we suggest a way of acquiring more information from corporate <span class='snippet'>data</span> warehouses without the complications and drawbacks of deploying additional software systems. Association-rule mining, which captures co-occurrence patterns within <span class='snippet'>data</span>, has attracted considerable efforts from <span class='snippet'>data</span> warehousing researchers and practitioners alike. Unfortunately, most <span class='snippet'>data</span> mining tools are loosely coupled, at best, with the <span class='snippet'>data</span> warehouse repository. Furthermore, these tools can often find association rules only within the main fact table of the <span class='snippet'>data</span> warehouse (thus ignoring the information-rich dimensions of the star schema) and are not easily applied on non-transaction level <span class='snippet'>data</span> often found in <span class='snippet'>data</span> warehouses

如何删除所有标签 <span class='snippet'>..</span> ,但仍保留关键字数据以进行摘要:

许多组织经常未充分利用其现有的数据仓库。在本文中,我们提出了一种从企业数据仓库获取更多信息的方法,而无需部署其他软件系统的复杂性和缺点。关联规则挖掘捕获数据中的共现模式,吸引了数据仓库研究人员和从业者的大量努力。不幸的是,大多数数据挖掘工具充其量只是与数据仓库存储库松散耦合。此外,这些工具通常只能在数据仓库的主事实表中找到关联规则(从而忽略了星型模式的信息丰富的维度),并且不容易应用于数据仓库中常见的非事务级数据

最佳答案

strip_tags() 是你的 friend 。 Code kindly copied from here

  public static String strip_tags(String text, String allowedTags) {
String[] tag_list = allowedTags.split(",");
Arrays.sort(tag_list);

final Pattern p = Pattern.compile("<[/!]?([^\\\\s>]*)\\\\s*[^>]*>",
Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);

StringBuffer out = new StringBuffer();
int lastPos = 0;
while (m.find()) {
String tag = m.group(1);
// if tag not allowed: skip it
if (Arrays.binarySearch(tag_list, tag) < 0) {
out.append(text.substring(lastPos, m.start())).append(" ");

} else {
out.append(text.substring(lastPos, m.end()));
}
lastPos = m.end();
}
if (lastPos > 0) {
out.append(text.substring(lastPos));
return out.toString().trim();
} else {
return text;
}
}

关于java - 如何删除字符串中的html标签?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3974600/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com