gpt4 book ai didi

java - 如何在java中从Html中的Div标签中提取文本

转载 作者:行者123 更新时间:2023-12-02 07:40:37 25 4
gpt4 key购买 nike

嗨,

我想提取 div 标记之间的文本

<div class="innercontenttxt"> 
<p>img border="1" align="left" height="170" width="324" vspace="3" hspace="2" src="/tmdbuserfiles/ramdev-balakrishna(1).jpg" alt="ramdev aide remanded, lakrishna acharya judicial remand, ramdev aide fake passport case, baba ramdev assistant judicial custody, balakrishna sent to judicial custody, yoga guru ramdev assistant remanded, yoga guru ramdev assistant balakrishna" />
Yoga guru Ramdev's aide Balakrishna Acharya remanded to 14 days judicial custody in a fake passport on Saturday. He was arrested yesterday after he failed to appear at a Dehradun court.
<br />
<br />
Balakrishna Acharya, who is basically a Nepalese citizen,
is alleged to have submitted fake documents to procure a passport.
When he failed to appear in Dehradun court in connection with the case,
</p>
</div>

解压后的结果应该是:

ramdev aide alakrishna Acharya remanded to 14 days judicial custody in a fake passport on Saturday. He was arrested yesterday after he failed to appear at a Dehradun court.Balakrishna Acharya, who is basically a Nepalese citizen, is alleged to have submitted fake documents to procure a passport. When he failed to appear in Dehradun court in connection with the case, the court had issued a non-bailable warrant and subsequently arrested him yesterday.

最佳答案

这个问题似乎与此类似other question

假设您已经将 html 源存储在名为 htmlPage 的字符串变量中。

int divIndex = htmlPage.indexOf("<div");
divIndex = htmlPage.indexOf(">", divIndex);

int endDivIndex = htmlPage.indexOf("</div>", divIndex);
String content = htmlPage.substring(divIndex + 1, endDivIndex);

关于java - 如何在java中从Html中的Div标签中提取文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11662505/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com