gpt4 book ai didi

php - HTML 字符串中的自动换行/剪切文本

转载 作者:搜寻专家 更新时间:2023-10-31 20:50:14 25 4
gpt4 key购买 nike

我想做的是:我有一个包含 HTML 标签的字符串,我想使用不包括 HTML 标签的 wordwrap 函数将其剪切。

我卡住了:

public function textWrap($string, $width)
{
$dom = new DOMDocument();
$dom->loadHTML($string);
foreach ($dom->getElementsByTagName('*') as $elem)
{
foreach ($elem->childNodes as $node)
{
if ($node->nodeType === XML_TEXT_NODE)
{
$text = trim($node->nodeValue);
$length = mb_strlen($text);
$width -= $length;
if($width <= 0)
{
// Here, I would like to delete all next nodes
// and cut the current nodeValue and finally return the string
}
}
}
}
}

我不确定目前我是否以正确的方式进行操作。我希望这很清楚...

编辑:

举个例子。我有这段文字

    <p>
<span class="Underline"><span class="Bold">Test to be cut</span></span>
</p><p>Some text</p>

假设我想在第 6 个字符处剪切它,我想返回这个:

<p>
<span class="Underline"><span class="Bold">Test to</span></span>
</p>

最佳答案

正如我在评论中所写,您首先需要找到要剪切的文本偏移量。

首先,我设置了一个包含 HTML 片段的 DOMDocument,然后选择在 DOM 中代表它的主体:

$htmlFragment = <<<HTML
<p>
<span class="Underline"><span class="Bold">Test to be cut</span></span>
</p><p>Some text </p>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($htmlFragment);
$parent = $dom->getElementsByTagName('body')->item(0);
if (!$parent)
{
throw new Exception('Parent element not found.');
}

然后我使用我的 TextRange 类找到需要进行剪切的位置,然后我使用 TextRange 实际进行剪切并找到 DOMNode 应该成为片段的最后一个节点:

$range = new TextRange($parent);

// find position where to cut the HTML textual represenation
// by looking for a word or the at least matching whitespace
// with a regular expression.
$width = 17;
$pattern = sprintf('~^.{0,%d}(?<=\S)(?=\s)|^.{0,%1$d}(?=\s)~su', $width);
$r = preg_match($pattern, $range, $matches);
if (FALSE === $r)
{
    throw new Exception('Wordcut regex failed.');
}
if (!$r)
{
throw new Exception(sprintf('Text "%s" is not cut-able (should not happen).', $range));
}

此正则表达式在 $range 提供的文本表示中找到要剪切内容的偏移量。正则表达式模式是 inspired by another answer,它对其进行了更详细的讨论,并稍作修改以满足此答案的需要。

// chop-off the textnodes to make a cut in DOM possible
$range->split($matches[0]);
$nodes = $range->getNodes();
$cutPosition = end($nodes);

因为有可能没有什么可剪切的(例如 body 将变为空),我需要处理这种特殊情况。否则 - 如评论中所述 - 所有以下节点都需要删除:

// obtain list of elements to remove with xpath
if (FALSE === $cutPosition)
{
// if there is no node, delete all parent children
$cutPosition = $parent;
$xpath = 'child::node()';
}
else
{
$xpath = 'following::node()';
}

剩下的很简单:查询xpath,删除节点并输出结果:

// execute xpath
$xp = new DOMXPath($dom);
$remove = $xp->query($xpath, $cutPosition);
if (!$remove)
{
throw new Exception('XPath query failed to obtain elements to remove');
}

// remove nodes
foreach($remove as $node)
{
$node->parentNode->removeChild($node);
}

// inner HTML (PHP >= 5.3.6)
foreach($parent->childNodes as $node)
{
echo $dom->saveHTML($node);
}

完整的代码示例是 available on viper codepad incl。 TextRange 类。键盘有一个错误,所以它的结果不正确(相关:XPath query result order)。实际输出如下:

<p>
<span class="Underline"><span class="Bold">Test to</span></span></p>

所以请注意您有一个当前的 libxml 版本(通常情况下),最后的输出 foreach 使用了一个 PHP 函数 saveHTML自 PHP 5.3.6 起的参数。如果您没有该 PHP 版本,请采取一些替代方案,如 How to get the xml content of a node as a string? 或类似问题中概述的那样。

当您仔细查看我的示例代码时,您可能会注意到剪切长度非常大 ($width = 17;)。那是因为文本前面有很多空白字符。这可以通过使正则表达式在其前面删除任意数量的空格和/或首先修剪 TextRange 来调整。第二个选项确实需要更多功能,我快速写了一些可以在创建初始范围后使用的东西:

...
$range = new TextRange($parent);
$trimmer = new TextRangeTrimmer($range);
$trimmer->trim();
...

这将删除 HTML 片段中左右两侧不必要的空白。 TextRangeTrimmer 代码如下:

class TextRangeTrimmer
{
/**
* @var TextRange
*/
private $range;

/**
* @var array
*/
private $charlist;

public function __construct(TextRange $range, Array $charlist = NULL)
{
$this->range = $range;
$this->setCharlist($charlist);
}
/**
* @param array $charlist list of UTF-8 encoded characters
* @throws InvalidArgumentException
*/
public function setCharlist(Array $charlist = NULL)
{
if (NULL === $charlist)
$charlist = str_split(" \t\n\r\0\x0B")
;

$list = array();

foreach($charlist as $char)
{
if (!is_string($char))
{
throw new InvalidArgumentException('Not an Array of strings.');
}
if (strlen($char))
{
$list[] = $char;
}
}

$this->charlist = array_flip($list);
}
/**
* @return array characters
*/
public function getCharlist()
{
return array_keys($this->charlist);
}
public function trim()
{
if (!$this->charlist) return;
$this->ltrim();
$this->rtrim();
}
/**
* number of consecutive charcters of $charlist from $start to $direction
*
* @param array $charlist
* @param int $start offset
* @param int $direction 1: forward, -1: backward
* @throws InvalidArgumentException
*/
private function lengthOfCharacterSequence(Array $charlist, $start, $direction = 1)
{
$start = (int) $start;
$direction = max(-1, min(1, $direction));
if (!$direction) throw new InvalidArgumentException('Direction must be 1 or -1.');

$count = 0;
for(;$char = $this->range->getCharacter($start), $char !== ''; $start += $direction, $count++)
if (!isset($charlist[$char])) break;

return $count;
}
public function ltrim()
{
$count = $this->lengthOfCharacterSequence($this->charlist, 0);

if ($count)
{
$remainder = $this->range->split($count);
foreach($this->range->getNodes() as $textNode)
{
$textNode->parentNode->removeChild($textNode);
}
$this->range->setNodes($remainder->getNodes());
}

}
public function rtrim()
{
$count = $this->lengthOfCharacterSequence($this->charlist, -1, -1);

if ($count)
{
$chop = $this->range->split(-$count);
foreach($chop->getNodes() as $textNode)
{
$textNode->parentNode->removeChild($textNode);
}
}
}
}

希望对您有所帮助。

关于php - HTML 字符串中的自动换行/剪切文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8482339/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com