gpt4 book ai didi

java - 在 android studio 中获取下一个具有同名类 jsoup 的元素

转载 作者:行者123 更新时间:2023-12-02 03:56:33 25 4
gpt4 key购买 nike

我想获取 html 中具有相同名称类的下一个元素。 html 标签如下:

html:

  <section class="post">
<img class="pecintakomik" src="/images/top/op.jpg" alt="pecintakomik.com" />
<div class="post-cnt">
<h2>Manga bla bla</h2>
<ul>
<li><strong>Nama Alternatif:</strong> </li>
<li><strong>Tahun Rilis:</strong> 2010</li>
<li><strong>Author(s):</strong> sensei1
<li><strong>Artist(s):</strong> sense2</li>
<li><strong>Genre:</strong> Action</li>
<li><strong>Sinopsis:</strong> bla bla bla </li>
<li><span class='st_facebook_hcount' displayText='Facebook'></span> <span class='st_twitter_hcount' displayText='Tweet'></span> <span class='st_sharethis_hcount' displayText='ShareThis'></span></li>
</ul>
</div>
<div class="clear">&nbsp;</div>
</section>
<img src="http://www.pecintakomik.com/images/block.png">
<section class="post">
<div class="post-cnt">
<h2>List Chapter(s)</h2>
<ul>
<li><a href="/manga/bla_bla/816"> bla bl 816 <img src="/images/new.gif"><em>Baca Online </em></a></li>
<li><a href="/manga/bla_bla/815"> bla bla 815<em>Baca Online </em></a></li>
<li><a href="/manga/bla_bla/814"> bla bla 814<em>Baca Online </em></a></li>
<li><a href="/manga/bla_bla/813"> bla bla 813<em>Baca Online </em></a></li>
</ul>
</div>
</section>

我的代码是获取列表漫画的href链接(并将其存储在sqllite上),但我无法获取它:

java代码:

private List<Chapter> parseHtmlToChapters(RequestWrapper request, String unparsedHtml) {
int beginIndex = unparsedHtml.indexOf("<div class=\"post-cnt\">");
int endIndex = unparsedHtml.indexOf("</div>", beginIndex);

String trimmedHtml = unparsedHtml.substring(beginIndex, endIndex);

Document parsedDocument = Jsoup.parse(trimmedHtml);


List<Chapter> chapterList = scrapeChaptersFromParsedDocument(parsedDocument);
chapterList = setSourceForChapterList(chapterList);
chapterList = setParentUrlForChapterList(chapterList, request.getUrl());
chapterList = setNumberForChapterList(chapterList);

saveChaptersToDatabase(chapterList, request.getUrl());

return chapterList;
}

private List<Chapter> scrapeChaptersFromParsedDocument(Document parsedDocument) {
List<Chapter> chapterList = new ArrayList<Chapter>();

Element chapterElementnya = parsedDocument.select("div.post-cnt").get(1);
Elements chapterElements = chapterElementnya.getElementsByTag("li");


for (Element chapterElement : chapterElements) {
Chapter currentChapter = constructChapterFromHtmlBlock(chapterElement);

chapterList.add(currentChapter);
}

return chapterList;
}

private Chapter constructChapterFromHtmlBlock(Element chapterElement) {
Chapter newChapter = DefaultFactory.Chapter.constructDefault();

Element urlElement = chapterElement.select("a").first();
Element nameElement = chapterElement.select("a").first();

if (urlElement != null) {
String fieldUrl = "http://www.pecintakomik.com" + urlElement.attr("href");
newChapter.setUrl(fieldUrl);
}
if (nameElement != null) {
String fieldName = nameElement.text();
newChapter.setName(fieldName);
}

boolean fieldNew = chapterElement.html().contains("<img src=\"/images/new.gif\">");
newChapter.setNew(fieldNew);

return newChapter;
}

请问有人知道我如何获得同名的二等舱列表吗?

最佳答案

这段代码:

private List<Chapter> parseHtmlToChapters(RequestWrapper request, String unparsedHtml) {
int beginIndex = unparsedHtml.indexOf("<div class=\"post-cnt\">");
int endIndex = unparsedHtml.indexOf("</div>", beginIndex);

String trimmedHtml = unparsedHtml.substring(beginIndex, endIndex);
...
}

仅保留第一个列表。 trimmedHtml 将包含以下内容:

<div class="post-cnt">
<h2>Manga bla bla</h2>
<ul>
<li><strong>Nama Alternatif:</strong> </li>
<li><strong>Tahun Rilis:</strong> 2010</li>
<li><strong>Author(s):</strong> sensei1
<li><strong>Artist(s):</strong> sense2</li>
<li><strong>Genre:</strong> Action</li>
<li><strong>Sinopsis:</strong> bla bla bla </li>
<li><span class='st_facebook_hcount' displayText='Facebook'></span> <span class='st_twitter_hcount' displayText='Tweet'></span> <span class='st_sharethis_hcount' displayText='ShareThis'></span></li>
</ul>
</div>

要保留两个列表,您可以这样做:

int beginIndex = unparsedHtml.indexOf("<div class=\"post-cnt\">");
int secondListStart = unparsedHtml.indexOf("<div class=\"post-cnt\">",beginIndex + "<div class=\"post-cnt\">".length());
int endIndex = unparsedHtml.indexOf("</div>", secondListStart) + "</div>".length();

String trimmedHtml = unparsedHtml.substring(beginIndex, endIndex);

但是解析整个页面会安全得多。为此,请更改:

Document parsedDocument = Jsoup.parse(trimmedHtml);

致:

Document parsedDocument = Jsoup.parse(unparsedHtml);

关于java - 在 android studio 中获取下一个具有同名类 jsoup 的元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35392540/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com