gpt4 book ai didi

php - 防止 DOMDocument::loadHTML() 转换实体

转载 作者:可可西里 更新时间:2023-10-31 22:15:03 24 4
gpt4 key购买 nike

我有一个字符串值,我正在尝试为其提取列表项。我想提取文本和任何子节点,但是,DOMDocument 正在将实体转换为字符,而不是保留原始状态。

我已经尝试将 DOMDocument::resolveExternals 和 DOMDocument::substituteEntities 设置为 false,但这没有任何效果。应该注意我在 Win7 上运行 PHP 5.2.17。

示例代码是:

$example = '<ul><li>text</li>'.
'<li>&frac12; of this is <strong>strong</strong></li></ul>';

echo 'To be converted:'.PHP_EOL.$example.PHP_EOL;

$doc = new DOMDocument();
$doc->resolveExternals = false;
$doc->substituteEntities = false;

$doc->loadHTML($example);

$domNodeList = $doc->getElementsByTagName('li');
$count = $domNodeList->length;

for ($idx = 0; $idx < $count; $idx++) {
$value = trim(_get_inner_html($domNodeList->item($idx)));
/* remainder of processing and storing in database */
echo 'Saved '.$value.PHP_EOL;
}

function _get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}

return $innerHTML;
}

½ 最终被转换为 ½(单字符/UTF-8 版本,而非实体版本),这不是所需的格式。

最佳答案

不是PHP 5.3.6++的解决方法

$html =<<<HTML
<ul><li>text</li>
<li>&frac12; of this is <strong>strong</strong></li></ul>
HTML;

$doc = new DOMDocument();
$doc->resolveExternals = false;
$doc->substituteEntities = false;
$doc->loadHTML($html);
foreach ($doc->getElementsByTagName('li') as $node)
{
echo htmlentities(iconv('UTF-8', 'ISO-8859-1', $node->nodeValue)), "\n";
}

关于php - 防止 DOMDocument::loadHTML() 转换实体,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7343284/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com