gpt4 book ai didi

php - 在解析html到dom树中,如何在php中按标签拆分字符串?

转载 作者:行者123 更新时间:2023-12-04 06:18:58 27 4
gpt4 key购买 nike

这是字符串:

<div>This is a test.</div>
<div>This <b>another</b> a test.</div>
<div/>
<div>This is last a test.</div>

我想将以下字符串分隔为这样的数组:
{"This is a test.", "This <b>another</b> a test.", "", "This is last a test."}

任何想法在 php 中这样做?谢谢你。

最佳答案

我假设您的 HTML 格式错误

有很多选项,包括 xpath 和众多库。 Regex is not a good idea .我找到 DOMDocument快速且相对简单。

getElementsByTagName 然后遍历它们获取innerHTML。

示例:

<?php
function get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}

return $innerHTML;
}
$str = <<<'EOD'
<div>This is a test.</div>
<div>This <b>another</b> a test.</div>
<div/>
<div>This is last a test.</div>
EOD;

$doc = new DOMDocument();
$doc->loadHTML($str);
$ellies = $doc->getElementsByTagName('div');
foreach ($ellies as $one_el) {
if ($ih = get_inner_html($one_el))
$array[] = $ih;
}
?>
<pre>
<?php print_r($array); ?>
</pre>

// Output
// Note that there would be
// a 4th array elemnt w/o the `if ($ih = get_inner_html($one_el))` check:
Array
(
[0] => This is a test.
[1] => This <b>another</b> a test.
[2] => This is last a test.
)

Try it out here

注:

只要您没有嵌套的 DIVS,上述内容就可以正常工作。如果确实有嵌套,则必须在循环使用 innerHTML 时排除嵌套的子级。

例如,假设您有以下 HTML:
<div>One
<div>Two</div>
<div>Three</div>
<div/>
<div>Four
<div>Five</div>
</div>

以下是如何处理上述问题并获得一个按顺序排列的数组:

处理嵌套
<?php
function get_inner_html_unnested( $node, $exclude ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
if (!property_exists($child, 'tagName') || ($child->tagName != $exclude))
$innerHTML .= trim($child->ownerDocument->saveXML( $child ));
}

return $innerHTML;
}
$str = <<<'EOD'
<div>One
<div>Two</div>
<div>Three</div>
<div/>
<div>Four
<div>Five</div>
</div>
EOD;

$doc = new DOMDocument();
$doc->loadHTML($str);
$ellies = $doc->getElementsByTagName('div');
foreach ($ellies as $one_el) {
if ($ih = get_inner_html_unnested($one_el, 'div'))
$array[] = $ih;
}
?>
<pre>
<?php print_r($array); ?>
</pre>

Try it out here

关于php - 在解析html到dom树中,如何在php中按标签拆分字符串?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6854669/

27 4 0