gpt4 book ai didi

php - 如何使用php从HTML表格中提取数据

转载 作者:可可西里 更新时间:2023-11-01 00:57:30 28 4
gpt4 key购买 nike

我一直在尝试不同的方法从 HTML 表中提取数据,例如使用 xpath。表不包含任何类,所以我不确定如何在没有类或 Id 的情况下使用 xpath。正在从 rss xml 文件中检索此数据。我目前正在使用 DOM。提取数据后,我将尝试按职位对表格进行排序

这是我的php代码

$html='';
$xml= simplexml_load_file($url) or die("ERROR: Cannot connect to url\n check if report still exist in the Gradleaders system");

/*What we do here in this loop is retrieve all content inside the encoded content,
*which includes the CDATA information. This is where the HTML and styling is included.
*/

foreach($xml->channel->item as $cont){
$html=''.$cont->children('content',true)->encoded.'<br>'; //actual tag name is encoded
}

$htmlParser= new DOMDocument(); //to parse html using DOMDocument
libxml_use_internal_errors(true); // your HTML gives parser warnings, keep them internal
$htmlParser->loadHTML($html); //Loaded the html string we took from simple xml

$htmlParser->preserveWhiteSpace = false;
$tables= $htmlParser->getElementsByTagName('table');
$rows= $tables->item(0)->getElementsByTagName('tr');

foreach($rows as $row){
$cols = $row->getElementsByTagName('td');
echo $cols;
}

这是我从中提取信息的 HTML

<table cellpadding='1' cellspacing='2'>
<tr>
<td><b>Job Title:</b></td>
<td>Job Example </td>
</tr>
<tr>
<td><b>Job ID:</b></td>
<td>23992</td>
</tr>
<tr>
<td><b>Job Description:</b></td>
<td>Just a job example </td>
</tr>
<tr>
<td><b>Job Category:</b></td>
<td>Work-study Position</td>
</tr>
<tr>
<td><b>Position Type:</b></td>
<td>Work-study</td>
</tr>
<tr>
<td><b>Applicant Type:</b></td>
<td>Work-study</td>
</tr>
<tr>
<td><b>Status:</b></td>
<td>Active</td>
</tr>
<tr>
<td colspan='2'><b><a href='https://www.myjobs.com/tuemp/job_view.aspx?token=I1iBwstbTs2pau+SjrYfWA%3d%3d'>Click to View More</a></b></td>
</tr>
</table>

最佳答案

您可以使用 xpath query('//td') 并使用 C14N()< 检索 td html/,类似于:

$dom = new DOMDocument();
$dom->loadHtml($html);
$x = new DOMXpath($dom);
foreach($x->query('//td') as $td){
echo $td->C14N();
//if just need the text use:
//echo $td->textContent;
}

输出:

<td><b>Job Title:</b></td>
<td>Job Example </td>
<td><b>Job ID:</b></td>
...

C14N();

Returns canonicalized nodes as a string or FALSE on failure


更新:

Another question, how can I grab individual Table Data? For example, just grab, Job ID

使用XPath包含,即:

foreach($x->query('//td[contains(., "Job ID:")]') as $td){
echo $td->textContent;
}

更新 V2:

How can I get the next Table Data after that (to actually get the Job Id) ?

使用following-sibling::*[1],即:

echo $x->query('//td[contains(*, "Job ID:")]/following-sibling::*[1]')->item(0)->textContent;
//23992

关于php - 如何使用php从HTML表格中提取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37216042/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com