gpt4 book ai didi

Java Android xPath html 解析

转载 作者:行者123 更新时间:2023-12-02 00:35:12 29 4
gpt4 key购买 nike

我有一个应用程序需要获取 html 并在其中获取一些标签。

我需要获取所有 tr 和所有 td,并获取它们的内部文本。

你能给我一个代码吗?

我已经在工作这个小时了...

网站内容是:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">    
<!-- Updated: 03/11/2011 15:17:29-->
<html xmlns="http://www.w3.org/1999/xhtml" >
<head><title>
Untitled Page
</title><meta http-equiv="Page-Exit" content="progid:DXImageTransform.Microsoft.GradientWipe(duration=1)" /><meta HTTP-EQUIV="CACHE-CONTROL" content="NO-CACHE" /><meta HTTP-EQUIV="PRAGMA" content="NO-CACHE" /><meta http-equiv="refresh" content="60" />
<style type="text/css">
.DisplayTable { width: 97%; }
.DisplayHeader { font-family: Arial; font-weight: bold; font-size: 25px; color: Black; text-align: center; }
.DisplayCell { font-family: Arial; font-weight: bold; font-size: 16px; color: Black; }
.MessageTable { width: 97%; }
.MessageHeader { font-family: Arial; font-size: 20px; color: SteelBlue; border-bottom: solid 3px SteelBlue; }
.MessageText { font-family: Arial; font-size: 20px; color: SteelBlue; text-align: right; }
.DisplayFillChange { font-family: Arial; font-weight: bold; font-size: 16px; color: MediumBlue; background-color: LightCyan; border-bottom: solid 1px LightCyan; }
.DisplayFreeChange { font-family: Arial; font-weight: bold; font-size: 16px; color: OrangeRed; background-color: LightCyan; border-bottom: solid 1px LightCyan; }
.DisplayEventChange { font-family: Arial; font-weight: bold; font-size: 16px; color: DarkGreen; background-color: LightCyan; border-bottom: solid 1px LightCyan; }
.DisplayExamChange { font-family: Arial; font-weight: bold; font-size: 16px; color: IndianRed; background-color: LightCyan; border-bottom: solid 1px LightCyan; }
</style>
</head>
<body dir="rtl" style="margin: 0px; background-color: LightCyan; overflow: hidden;" scroll="no" onload="resize()">
<form name="form1" method="post" action="MainScreen.aspx?pid=17&amp;mid=6264&amp;page=5&amp;msgof=0&amp;static=1" id="form1">
<div>
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUJLTQwMjA0MzQzZGSqqj0xDnBRKxIgowwhNZzzyzQHVg==" />
</div>
<table width="100%" cellspacing="0" cellpadding="0" border="0" style="background-image: url(fill.gif);">
<tr height="59" style="font-family: Arial; font-size: 34px; color: Yellow; vertical-align: middle;">
<td width="15">&nbsp;</td>
<td width="45%" align="right" id="clock">00:00</td>
<td align="center" nowrap><b>שינוי מערכת שעות לתאריך </b></td>
<td width="45%" align="left">04.11.2011</td>
<td width="15">&nbsp;</td>
</tr>
</table>
<br />
<div id="header" align="center"><table width='100%' class='DisplayTable' cellspacing='0' border='1'><tr class='DisplayHeader'><td width='1%' style='color: LightCyan;'>0</td><td width='14%'>יא - 1</td><td width='14%'>יא - 2</td><td width='14%'>יא - 3</td><td width='14%'>יא - 4</td><td width='14%'>יא - 5</td><td width='14%'>יא - 6</td><td width='14%'>יא - 7</td><td width='1%' style='color: LightCyan;'>0</td></tr></table></div>
<div id="scrollPanel" align="center" style="overflow: hidden;">
<div id="panel" align="center" style=""><table width='100%' class='DisplayTable' cellspacing='0' border='1'><tr><td width='1%' class='DisplayCell'>0</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='1%' class='DisplayCell'>0</td></tr><tr><td width='1%' class='DisplayCell'>1</td><td width='14%' class='DisplayCell'><table width='100%'></table></td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='1%' class='DisplayCell'>1</td></tr><tr><td width='1%' class='DisplayCell'>2</td><td width='14%' class='DisplayCell'><table width='100%'></table></td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='1%' class='DisplayCell'>2</td></tr><tr><td width='1%' class='DisplayCell'>3</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='1%' class='DisplayCell'>3</td></tr><tr><td width='1%' class='DisplayCell'>4</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='1%' class='DisplayCell'>4</td></tr><tr><td width='1%' class='DisplayCell'>5</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='1%' class='DisplayCell'>5</td></tr><tr><td width='1%' class='DisplayCell'>6</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='1%' class='DisplayCell'>6</td></tr><tr><td width='1%' class='DisplayCell'>7</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='1%' class='DisplayCell'>7</td></tr><tr><td width='1%' class='DisplayCell'>8</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='1%' class='DisplayCell'>8</td></tr><tr><td width='1%' class='DisplayCell'>9</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='14%' class='DisplayCell'>&nbsp;</td><td width='1%' class='DisplayCell'>9</td></tr></table></div>
<div id="messages" align="center"><table width='100%' class='MessageTable' cellspacing='0' cellpadding='7' border='0'><tr><td class='MessageHeader'>הודעות</td></tr></tr></table></div>
</div>
</form>
<script>
var sp;
var delay = 0;
function resize(){
sp = document.getElementById('scrollPanel');
sp.style.height = document.documentElement.clientHeight - sp.offsetTop;
delay = document.getElementById('panel').clientHeight - document.getElementById('scrollPanel').clientHeight;
if (delay > 0)
delay = delay / 5 * 120;
else
delay = 0;
setTimeout("doScroll()", 3000);
setTimeout("doNextPage()", 500);
}
function doScroll()
{
sp.scrollTop += 5;
setTimeout("doScroll()", 100);
}
updateClock();
function nextUrl()
{
return 'MainScreen.aspx?pid=17&mid=6264&page=6&msgof=0&nd=0';
}
function doNextPage()
{
}
function updateClock()
{
document.getElementById('clock').innerHTML = getClock();
setTimeout("updateClock()", 55000)
}
function getClock()
{
var date = new Date();
var hours = date.getHours();
var minutes = date.getMinutes();
if (hours < 10)
hours = '0' + hours;
if (minutes < 10)
minutes = '0' + minutes;
return hours + ':' + minutes;
}
</script>
</body>
</html>

最佳答案

最简单的方法是使用 HTML 解析库,例如HTMLCleaner、TagSoup、HTML Parser 等。这样您就可以简单地从文档中获取所需的元素,或者使用“节点访问者”手动迭代它 - 或任何库的名称。

快速浏览 documentation从上面随机选择的库,建议类似以下内容应该适用于 HTMLCleaner:

HtmlCleaner cleaner = new HtmlCleaner();
TagNode root= cleaner.clean(...);
TagNode[] trNodes= root.getElementsByName("tr");
for (TagNode trNode : trNodes) {
System.out.println("All text inside this <tr> tag (including children): " + trNode.getText());
}

使用相同库的示例,但现在使用 TagNodeVisitor 并按 <td> 进行过滤:

node.traverse(new TagNodeVisitor() {
public boolean visit(TagNode tagNode, HtmlNode htmlNode) {
if (htmlNode instanceof TagNode) {
TagNode tag = (TagNode) htmlNode;
String tagName = tag.getName();
if ("td".equals(tagName)) {
System.out.println("All text inside this <td> tag (including children): " + tag.getText());
}
}
// tells visitor to continue traversing the DOM tree
return true;
}
});

关于Java Android xPath html 解析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8007333/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com