- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我已经使用了一些 Jsoup
方法来获取包含网页 HTML 代码一部分的字符串:
protected String doInBackground(String... arguments) {
// extract arguments
String newsurl = arguments[0];
//
Document doc = null;
try {
doc = Jsoup.connect(newsurl).get();
} catch (IOException e) {
e.printStackTrace();
} catch (NullPointerException e) {
e.printStackTrace();
}
if (doc != null) {
Elements myElements = doc.getElementsByClass("news_list");
string1 = myElements.toString();
Log.i("ELEMENTS HTML", string1);
} else {
string1 = "FAILED";
}
return string1;
}
但是,我真的找不到可以将 HTML
文件进一步划分为来自 Elements 类的可字符串化部分的方法。我感觉我的方法不对。
我想使用的 HTML 部分如下所示:
<table class="news_list" cellspacing="0" cellpadding="0" border="0" id="ctl00_cphInnerPage_cntrlNewsList_gvNews" style="border-width:0px;width:100%;border-collapse:collapse;">
<tr>
<td>
<table cellpadding="0" cellspacing="0" width="100%" border="0">
<tr>
<td>
<div class="news_list_image" style="float:left; " >
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl02_lnkNewsImg" class="wborder" href="/NewsDetails.aspx?cat_id=1&news_id=462"><img src="/mc_newsdata/photos/635254712252165967_thumb.jpg" style="border-width:0px;" /></a>
</div>
<div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl02_lnkDatePosted" class="home_date sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=462">1/16/2014</a>
</div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl02_lnkTitle" class="home_title sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=462">Science Fair</a>
</div>
<div class="summary">
The annual Science Fair of the American College of Sofia took place on Wednesday, January 15. You could see photos of some of the incredible projects and experiments in our photo gallery.
</div>
<div style="text-align:right;">
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl02_hplReadMore" class="more sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=462">more<img id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl02_imgMore" src="/App_Themes/Default/images/more_new.gif" style="height:8px;width:8px;border-width:0px;" /></a>
</div>
</div>
</td>
</tr>
</table>
</td>
</tr><tr>
<td>
<table cellpadding="0" cellspacing="0" width="100%" border="0">
<tr>
<td>
<div class="news_list_image" style="float:left; " >
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl03_lnkNewsImg" class="wborder" href="/NewsDetails.aspx?cat_id=1&news_id=461"></a>
</div>
<div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl03_lnkDatePosted" class="home_date sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=461">1/10/2014</a>
</div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl03_lnkTitle" class="home_title sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=461">ACS Students’ Results from PISA 2012</a>
</div>
<div class="summary">
ACS recently received the official results of our students’ performance at the Programme for International Student Assessment (PISA) 2012. PISA is a triennial international survey developed by the Organisation for Economic Co-operation and Development (OECD) that takes place since 2000. It evaluates education systems worldwide by testing the skills and knowledge of 15-16-year-old students in the key subjects: reading, mathematics and science, with a focus on one subject in each year of assessment. In 2012, the assessment focused on students’ knowledge in mathematics.
</div>
<div style="text-align:right;">
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl03_hplReadMore" class="more sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=461">more<img id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl03_imgMore" src="/App_Themes/Default/images/more_new.gif" style="height:8px;width:8px;border-width:0px;" /></a>
</div>
</div>
</td>
</tr>
</table>
</td>
</tr><tr>
<td>
<table cellpadding="0" cellspacing="0" width="100%" border="0">
<tr>
<td>
<div class="news_list_image" style="float:left; " >
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl04_lnkNewsImg" class="wborder" href="/NewsDetails.aspx?cat_id=1&news_id=458"></a>
</div>
<div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl04_lnkDatePosted" class="home_date sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=458">12/20/2013</a>
</div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl04_lnkTitle" class="home_title sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=458">PHOTOS FROM THE CHRISTMAS CONCERT AND THE ALUMNI RECEPTION</a>
</div>
<div class="summary">
You can see some great photos from the amazing Annual Christmas Concert taken by Konstantin Karchev from 11 Grade, as well as some photos from the Alumni Reception by visiting the photogallery of the website.
</div>
<div style="text-align:right;">
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl04_hplReadMore" class="more sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=458">more<img id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl04_imgMore" src="/App_Themes/Default/images/more_new.gif" style="height:8px;width:8px;border-width:0px;" /></a>
</div>
</div>
</td>
</tr>
</table>
</td>
</tr><tr>
<td>
<table cellpadding="0" cellspacing="0" width="100%" border="0">
<tr>
<td>
<div class="news_list_image" style="float:left; " >
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl05_lnkNewsImg" class="wborder" href="/NewsDetails.aspx?cat_id=1&news_id=457"></a>
</div>
<div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl05_lnkDatePosted" class="home_date sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=457">12/19/2013</a>
</div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl05_lnkTitle" class="home_title sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=457">THREE ACS MEDAL-WINNERS MEET PRESIDENT PLEVNELIEV</a>
</div>
<div class="summary">
On December 16, the Third Olympic Meeting of the members of the national student science teams with Bulgarian President Rosen Plevneliev took place. Three ACS students, well-known in the ACS community for their successes in science, were among the invited: Viktor Kouzmanov 12/4, Konstantin Karchev 11/4, and Mihaela Zaharieva from the Class of 2013.
</div>
<div style="text-align:right;">
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl05_hplReadMore" class="more sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=457">more<img id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl05_imgMore" src="/App_Themes/Default/images/more_new.gif" style="height:8px;width:8px;border-width:0px;" /></a>
</div>
</div>
</td>
</tr>
</table>
</td>
</tr><tr>
<td>
<table cellpadding="0" cellspacing="0" width="100%" border="0">
<tr>
<td>
<div class="news_list_image" style="float:left; " >
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl06_lnkNewsImg" class="wborder" href="/NewsDetails.aspx?cat_id=1&news_id=456"><img src="/mc_newsdata/photos/635228847467352694_thumb.jpg" style="border-width:0px;" /></a>
</div>
<div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl06_lnkDatePosted" class="home_date sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=456">12/17/2013</a>
</div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl06_lnkTitle" class="home_title sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=456">ACS Debaters with an Award from a National Debate Tournament</a>
</div>
<div class="summary">
This past weekend ACSers from the Debate Club (with faculty advisors Adam Saligman, Milka Getsovska, and Michael Deegan) took part along with students from 14 other schools from all over the country in the first national Bulgarian Forensic League (“BFL”) tournament of the year. An ACS team consisting of students Adelina Ivanova (11/7), Veselin Nanov (10/2), and Mihail Georgiev (10/7) won the first prize in the "Karl Popper Debate" varsity category, a specific format involving a team of three debating another team of three, all in the age group of Grades 10 to 12. Congratulations to Adelina, Veselin, Mihail, and their faculty advisors for their great achievement!
</div>
<div style="text-align:right;">
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl06_hplReadMore" class="more sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=456">more<img id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl06_imgMore" src="/App_Themes/Default/images/more_new.gif" style="height:8px;width:8px;border-width:0px;" /></a>
</div>
</div>
</td>
</tr>
</table>
</td>
</tr><tr>
<td>
<table cellpadding="0" cellspacing="0" width="100%" border="0">
<tr>
<td>
<div class="news_list_image" style="float:left; " >
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl07_lnkNewsImg" class="wborder" href="/NewsDetails.aspx?cat_id=1&news_id=455"></a>
</div>
<div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl07_lnkDatePosted" class="home_date sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=455">12/13/2013</a>
</div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl07_lnkTitle" class="home_title sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=455">ACS Senior Victorious at an International Physics Olympiad</a>
</div>
<div class="summary">
Last Saturday, the Bulgarian Physics Team featuring ACS senior Victor Kouzmanov returned with the special Grand Prix team prize, one silver, and two bronze medals from the International Experimental Physics Olympiad held in Moscow November 27 through December 6. Congratulations and lots of success for the future to Victor and his teammates!
</div>
<div style="text-align:right;">
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl07_hplReadMore" class="more sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=455">more<img id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl07_imgMore" src="/App_Themes/Default/images/more_new.gif" style="height:8px;width:8px;border-width:0px;" /></a>
</div>
</div>
</td>
</tr>
</table>
</td>
</tr><tr>
<td>
<table cellpadding="0" cellspacing="0" width="100%" border="0">
<tr>
<td>
<div class="news_list_image" style="float:left; " >
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl08_lnkNewsImg" class="wborder" href="/NewsDetails.aspx?cat_id=1&news_id=453"></a>
</div>
<div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl08_lnkDatePosted" class="home_date sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=453">12/4/2013</a>
</div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl08_lnkTitle" class="home_title sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=453">ACS Alumnus Won a Prestigious Trading Competition in the US</a>
</div>
<div class="summary">
Congratulations to Kubrat Danailov of the ACS Class of 2011 on winning the prestigious Intercollegiate Trading Competition held in Boston, USA last month after competing with 100 other students from some of the best universities in the USA - MIT, Harvard, UPenn, Princeton, Yale, Columbia, Cornell, UChicago, Wellesley, Baruch, NYU, and Boston University.
</div>
<div style="text-align:right;">
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl08_hplReadMore" class="more sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=453">more<img id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl08_imgMore" src="/App_Themes/Default/images/more_new.gif" style="height:8px;width:8px;border-width:0px;" /></a>
</div>
</div>
</td>
</tr>
</table>
</td>
</tr><tr>
<td>
<table cellpadding="0" cellspacing="0" width="100%" border="0">
<tr>
<td>
<div class="news_list_image" style="float:left; " >
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl09_lnkNewsImg" class="wborder" href="/NewsDetails.aspx?cat_id=1&news_id=452"><img src="/mc_newsdata/photos/635210613621441367_thumb.jpg" style="border-width:0px;" /></a>
</div>
<div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl09_lnkDatePosted" class="home_date sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=452">11/26/2013</a>
</div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl09_lnkTitle" class="home_title sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=452">ACS OPEN VOLLEYBALL TOURNAMENT RESULTS</a>
</div>
<div class="summary">
The ACS OPEN Volleyball Tournament 2013 took place between Nov 18 through 24. <br/><br/>Below you can see the final standings for boys and girls, as well as the MVP awards winners:
</div>
<div style="text-align:right;">
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl09_hplReadMore" class="more sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=452">more<img id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl09_imgMore" src="/App_Themes/Default/images/more_new.gif" style="height:8px;width:8px;border-width:0px;" /></a>
</div>
</div>
</td>
</tr>
</table>
</td>
</tr><tr>
<td>
<table cellpadding="0" cellspacing="0" width="100%" border="0">
<tr>
<td>
<div class="news_list_image" style="float:left; " >
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl10_lnkNewsImg" class="wborder" href="/NewsDetails.aspx?cat_id=1&news_id=451"><img src="/mc_newsdata/photos/635209646534734186_thumb.jpg" style="border-width:0px;" /></a>
</div>
<div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl10_lnkDatePosted" class="home_date sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=451">11/22/2013</a>
</div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl10_lnkTitle" class="home_title sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=451">ART EXHIBITION</a>
</div>
<div class="summary">
The latest Art Exhibition is posted in the Art Gallery in Sanders Hall. It shows works of ACS students drawn in the elective Art classes. <br/>
</div>
<div style="text-align:right;">
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl10_hplReadMore" class="more sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=451">more<img id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl10_imgMore" src="/App_Themes/Default/images/more_new.gif" style="height:8px;width:8px;border-width:0px;" /></a>
</div>
</div>
</td>
</tr>
</table>
</td>
</tr><tr>
<td>
<table cellpadding="0" cellspacing="0" width="100%" border="0">
<tr>
<td>
<div class="news_list_image" style="float:left; " >
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl11_lnkNewsImg" class="wborder" href="/NewsDetails.aspx?cat_id=1&news_id=449"><img src="/mc_newsdata/photos/635204602267379593_thumb.jpg" style="border-width:0px;" /></a>
</div>
<div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl11_lnkDatePosted" class="home_date sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=449">11/19/2013</a>
</div>
<div>
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl11_lnkTitle" class="home_title sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=449">Day of Tolerance at ACS</a>
</div>
<div class="summary">
Today, November 19, ACS Club Embrace is organizing a series of events to mark the International Day of Tolerance celebrated on November 16 since 1995. After the discussion held during advisory periods and the lunch happening at Ostrander Foyer (see photo) the event will be marked by a screening at 3:30 PM of short movies dedicated to the subject of tolerance. All members of the ACS community are welcome to see the thought-provoking short movies!
</div>
<div style="text-align:right;">
<a id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl11_hplReadMore" class="more sitelink" href="/NewsDetails.aspx?cat_id=1&news_id=449">more<img id="ctl00_cphInnerPage_cntrlNewsList_gvNews_ctl11_imgMore" src="/App_Themes/Default/images/more_new.gif" style="height:8px;width:8px;border-width:0px;" /></a>
</div>
</div>
</td>
</tr>
</table>
</td>
</tr>
</table>
我想提取每条新闻的标题、日期、链接和内容并将其分发到数组/字符串中,并获取图像的链接。
提前感谢您的帮助!!
编辑:我突然想到,这些信息节点中的每一个都有其独特的类名,理论上我可以通过它进行搜索。但是Elements类没有类似于GetElementsByClass的类。
最佳答案
您可以使用 getElementsByTag
,因为您知道子元素是什么。在这种情况下,您需要处理所有具有所需值的子表:
因此,将您的 Elements
更改为:
Elements myElements = doc.getElementsByClass("news_list").first().getElementsByTag("table");
现在遍历每个元素以获取您的各个元素:
for (Element el : myElements) {
Element title = el.getElementsByClass("home_title").first();
Element date = el.getElementsByClass("home_date").first();
Element link = el.getElementsByClass("news_list_image").first();
System.out.println(title.text());
System.out.println(date.text());
System.out.println(link.child(0).attr("href"));
System.out.println();
}
值(value)观:
Science Fair
1/16/2014
/NewsDetails.aspx?cat_id=1&news_id=462
ACS Students’ Results from PISA 2012
1/10/2014
/NewsDetails.aspx?cat_id=1&news_id=461
PHOTOS FROM THE CHRISTMAS CONCERT AND THE ALUMNI RECEPTION
12/20/2013
/NewsDetails.aspx?cat_id=1&news_id=458
关于java - 如何使用 Jsoup 提取 HTML 的单独部分?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21511004/
我在使用 io-ts 时遇到一些问题。我发现它确实缺乏文档,我取得的大部分进展都是通过 GitHub issues 取得的。不,我不明白 HKT,所以没有帮助。 基本上,我在其他地方创建一个类型,ty
我必须创建一个正则表达式来搜索整个文件,以找到与 Java XML 解析器的第一部分(但不是第二部分)的匹配项。这将用于防止某些 XXE 攻击。不幸的是,它确实必须是单个正则表达式,并且它确实需要搜索
我有一些简单的 Shared/_Header.cshtml 文件中的内容。 My Shared/_Layout.cshtml 通过调用插入该代码 @Html.Partial("_Header") 目前
我有一个 if-else 语句,其中: 条件 1:ID 匹配并且自动填充某些字段。然后 if 语句只填充其余字段 条件 2:ID 不匹配,所有字段均为空白。 ELSE 语句将它们全部填充 当我使条件
我正在开发一个单页滚动网站。我正在尝试实现 ScrollMagic 并固定第一部分,以便网站的其余部分滚动到固定部分的顶部。我尝试创建一个 jsfiddle 来显示问题,但我似乎无法让 jsfiddl
这是我的情况: 我想使用 Google AdWords 的转换脚本,但出于某种原因,他们代码段的 javascript 部分在我的页面上添加了一些我似乎无法摆脱的不需要的空白。 所以我正在查看的选项纯
寻找一种优雅的方式在页面上添加一次脚本,就是这样。 我有一个需要 2 个 CSS 文件和 2 个 JS 文件的部分 View 。在大多数地方,只需要其中 1 个部分 View 。但在单个页面上,我需要
我想要一个网站,该网站始终具有相同的部分,具有相同的 id 以及我想要显示的所有内容。我对 javascript 不太了解,我想知道如何删除除特定部分之外的所有内容。 最好的方法是否是只执行一个循环来
SQL 语句教程 (11) Group By 我们现在回到函数上。记得我们用 SUM 这个指令来算出所有的 Sales (营业额)吧!如果我们的需求变成是要算出每一间店 (store_name)
我试图理解部分并认为我已经明白了。基本上,这是一种将部分应用程序应用于二元运算符的方法。所以我了解所有(2*) , (+1)等例子就好了。 但是在 O'Reilly Real World Haskel
有没有办法禁止在部分中覆盖给定的关键字参数?假设我要创建函数 bar总是有 a设置为 1 .在以下代码中: from functools import partial def foo(a, b):
我有这个使用节的 OpenMP 代码 #pragma omp parallel sections num_threads(8) { printf_s("Allo fro
我正在尝试重新创建 Apple 制作的有缺陷的 CNContactPickerViewController,因此我有一个数据数组 [CNContact],我需要将其整齐地显示在 UITableView
我有一个相对布局,其中包含一些 float 在 GridView 上的 TextView 。当我在网格中选择一个项目时,布局向下移动到屏幕的尽头,只有大约 1/5 的部分是可见的。这是使用简单的翻译动
我想在我的 tableView 中有两个部分。我希望将项目添加到第 0 节,然后能够选择一行以将其从第 0 节移动到第 1 节。到目前为止,我已将这些项目添加到第 0 节,但是当它关闭时数据不会加
我正在以自由职业者的身份开发支付控制软件,但我有一些关于 mysql 的问题。 。我有一个用作日志的表,名为“Bitacora”。在表中,我有一个名为 idCliente 的列,它是自己表中一个人的
我有一个 PFQueryTableViewController,我想向 tableview 添加部分,我这样尝试: - (PFQuery *)queryForTable { PFQuery *qu
我正在尝试编写一个查询,将部分匹配项与存储的名称值进行匹配。 我的数据库如下所示 Blockquote FirstName | Middle Name | Surname --------------
我正在开发一个语音备忘录应用程序,并且正在将文件保存到表格 View 中。我希望默认文件名显示为“新文件 1”,如果使用“新文件 1”,则它会显示为“新文件 2”,依此类推。 我正在尝试使用 do-w
我有以下简单的 HTML 布局 .section1 { background: red; } .section2 { background: green; } .section3 { ba
我是一名优秀的程序员,十分优秀!