gpt4 book ai didi

screen-scraping - 如何使用jsoup用span标签替换单词?

转载 作者:行者123 更新时间:2023-12-04 10:05:30 25 4
gpt4 key购买 nike

假设我有以下 html:

<html>
<head>
</head>
<body>
<div id="wrapper" >
<div class="s2">I am going <a title="some title" href="">by flying</a>
<p>mr tt</p>
</div>
</div>
</body>
</html>

文本节点中任何等于或大于 4 个字符的单词,例如单词“going”都将替换为 html 内容(不是文本) <span>going<span>在原始 html 中而不更改任何其他内容。

如果我尝试做类似 element.html(replacement) 的事情,问题是如果让当前元素是 <div class="s2">它也会擦掉 <a title="some title"

最佳答案

在这种情况下,您必须按照 this answer 的建议遍历您的文档。 .这是一种使用 Jsoup API 的方法:

  • NodeTraversorNodeVisitor允许你遍历 DOM
  • Node.replaceWith(...)允许替换 DOM 中的节点

  • 这是代码:
    public class JsoupReplacer {

    public static void main(String[] args) {
    so6527876();
    }

    public static void so6527876() {
    String html =
    "<html>" +
    "<head>" +
    "</head>" +
    "<body>" +
    " <div id=\"wrapper\" >" +
    " <div class=\"s2\">I am going <a title=\"some title\" href=\"\">by flying</a>" +
    " <p>mr tt</p>" +
    " </div> " +
    " </div>" +
    "</body> " +
    "</html>";
    Document doc = Jsoup.parse(html);

    final List<TextNode> nodesToChange = new ArrayList<TextNode>();

    NodeTraversor nd = new NodeTraversor(new NodeVisitor() {

    @Override
    public void tail(Node node, int depth) {
    if (node instanceof TextNode) {
    TextNode textNode = (TextNode) node;
    String text = textNode.getWholeText();
    String[] words = text.trim().split(" ");
    for (String word : words) {
    if (word.length() > 4) {
    nodesToChange.add(textNode);
    break;
    }
    }
    }
    }

    @Override
    public void head(Node node, int depth) {
    }
    });

    nd.traverse(doc.body());

    for (TextNode textNode : nodesToChange) {
    Node newNode = buildElementForText(textNode);
    textNode.replaceWith(newNode);
    }

    System.out.println("result: ");
    System.out.println();
    System.out.println(doc);
    }

    private static Node buildElementForText(TextNode textNode) {
    String text = textNode.getWholeText();
    String[] words = text.trim().split(" ");
    Set<String> longWords = new HashSet<String>();
    for (String word : words) {
    if (word.length() > 4) {
    longWords.add(word);
    }
    }
    String newText = text;
    for (String longWord : longWords) {
    newText = newText.replaceAll(longWord,
    "<span>" + longWord + "</span>");
    }
    return new DataNode(newText, textNode.baseUri());
    }

    }

    关于screen-scraping - 如何使用jsoup用span标签替换单词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6527876/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com