gpt4 book ai didi

php - 合并两个正则表达式以截断字符串中的单词

转载 作者:可可西里 更新时间:2023-11-01 12:58:15 25 4
gpt4 key购买 nike

我正在尝试提出以下将字符串截断为整个单词的函数(如果可能,否则它应该截断为字符):

function Text_Truncate($string, $limit, $more = '...')
{
$string = trim(html_entity_decode($string, ENT_QUOTES, 'UTF-8'));

if (strlen(utf8_decode($string)) > $limit)
{
$string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)~su', '$1', $string);

if (strlen(utf8_decode($string)) > $limit)
{
$string = preg_replace('~^(.{' . intval($limit) . '}).*~su', '$1', $string);
}

$string .= $more;
}

return trim(htmlentities($string, ENT_QUOTES, 'UTF-8', true));
}

这里有一些测试:

// Iñtërnâtiônàlizætiøn and then the quick brown fox... (49 + 3 chars)
echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn and then the quick brown fox jumped overly the lazy dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');

// Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_... (50 + 3 chars)
echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');

它们都按原样工作,但是如果我删除第二个 preg_replace(),我会得到以下结果:

Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog and one day the lazy dog humped the poor fox down until she died....

我不能使用 substr() 因为它只能在字节级别上工作,而且我无权访问 mb_substr() ATM,我做了几个尝试将第二个正则表达式与第一个正则表达式连接但没有成功。

请帮助 S.M.S.,我已经为此苦苦挣扎了将近一个小时。


编辑:对不起,我已经醒了 40 个小时了,我无耻地错过了这个:

$string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)?~su', '$1', $string);

不过,如果有人有更优化的正则表达式(或忽略尾随空格的正则表达式),请分享:

"Iñtërnâtiônàlizætiøn and then "
"Iñtërnâtiônàlizætiøn_and_then_"

编辑 2:我仍然无法摆脱尾随空格,有人可以帮我吗?

编辑 3:好吧,我的编辑都没有真正起作用,我被 RegexBuddy 愚弄了 - 我可能应该改天再做,现在就睡一觉。今天休息。

最佳答案

也许我可以在经历了一夜的 RegExp 噩梦之后给你一个快乐的早晨:

'~^(.{1,' . intval($limit) . '}(?<=\S)(?=\s)|.{'.intval($limit).'}).*~su'

归根结底:

^      # Start of String
( # begin capture group 1
.{1,x} # match 1 - x characters
(?<=\S)# lookbehind, match must end with non-whitespace
(?=\s) # lookahead, if the next char is whitespace, match
| # otherwise test this:
.{x} # got to x chars anyway.
) # end cap group
.* # match the rest of the string (since you were using replace)

您始终可以将 |$ 添加到 (?=\s) 的末尾,但由于您的代码已经在检查字符串长度是否长于 $limit,我不觉得这种情况是必要的。

关于php - 合并两个正则表达式以截断字符串中的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2682861/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com