gpt4 book ai didi

php - 如何通过 symfony 爬虫获取当前父节点之后的下一个节点?

转载 作者:太空宇宙 更新时间:2023-11-04 16:01:41 26 4
gpt4 key购买 nike

用于解析的 HTML 5 示例:

<div id="orderDetails">
<div> ... any number of blocks with unnecessary stuff ... </div>
<div>Label for important info</div>
<table> ... some other block type ... </table>
<div>Some very important info here</div>
<div> ... any number of blocks with unnecessary stuff ... </div>
</div>

我的 PHP 代码如下所示:

$crawler = new \Symfony\Component\DomCrawler\Crawler($html);
$label = $crawler->filter('#orderDetails div:contains("Label for important info")');
$info = $label->parent()->next('div');
assert('Some very important info here' === $info->text(), 'Important info must be grabbed from HTML');

但不幸的是,爬虫没有方法parentnext。但是..它有 parents 给我所有父节点==所有我不能不同的div。

所以在这种情况下我有两个问题:

  1. 如何获取当前节点的父节点?不是所有节点,而是“实际”节点!
  2. 如何使用 next/prev 的类比水平遍历 dom?

谢谢。

最佳答案

故事

在深入研究源代码之后,我发现方法 nextAll() 返回的不是“全部”,而是“一个”节点 ($node = $this->getNode( 0);).

这意味着如果我需要“当前节点之后的两个节点”,那么我必须编写 $node->nextAll()->nextAll()->nextAll()

WTF?这是 super 奇怪的命名约定 (0_0)。

答案

  1. How to get parent of current node? Not all nodes but "actual" one!
// This is only one parent node
$parent = $node->parents();
  1. How to traverse dom horizontally with some analogue of next/prev?
// This is only one node – next after current
$next = $node->nextAll();
// This is only one node – previous before current
$prev = $node->nextAll();
// This is only one node – next after two from current
$nextAfterTwo = $node->nextAll()->nextAll()->nextAll();

具体代码解决方案

因此,根据需要的实现确实存在,问题的功能解决方案如下所示:

/**
* Returns sibling node that is after current and filtered with selector
*
* @param Crawler $start Node from which start traverse
* @param string $selector CSS/XPath selector like in `Crawler::filter($selector)`
*
* @return Crawler Found node wrapped with Crawler
*
* @throws \InvalidArgumentException When node not found
*/
function getNextFiltered(Crawler $start, string $selector) : Crawler
{
$count = $start->parents()->count();
$next = $start->nextAll();
while ($count --> 0) {
$filtered = $next->filter($selector);
if ($filtered->count()) return $filtered;
$next = $next->nextAll();
}

throw new \InvalidArgumentException('No node found');
}

在我的例子中:

$crawler = new Crawler($html);
$label = $crawler->filter('#orderDetails div:contains("Label for important info")');
$info = getNextFiltered($label, 'div');
assert('Some very important info here' === $info->text(), 'Important info must be grabbed from HTML');

关于php - 如何通过 symfony 爬虫获取当前父节点之后的下一个节点?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41337779/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com