gpt4 book ai didi

php - 如何在 Php 中将多字节字符串拆分为单词?

转载 作者:可可西里 更新时间:2023-11-01 00:35:43 27 4
gpt4 key购买 nike

如何在 Php 中将多字节字符串拆分为单词?这是我到目前为止所做的,但我想改进代码...

   mb_internal_encoding( 'UTF-8');
mb_regex_encoding( 'UTF-8');
$arr = mb_split( '[\s\[\]().,;:-_]', $str );

有没有办法说单词是“alpha”字符的序列(不使用符号 a-z,因为我想包括非拉丁字符)

最佳答案

在这里试试这个宝贝:

preg_match_all('/[\p{L}\p{M}]+/u', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
# Matched text = $result[0][$i];
}

将所有可能的字母与它们的重音匹配为单词:

     "
[\p{L}\p{M}] # Match a single character present in the list below
# A character with the Unicode property “letter” (any kind of letter from any language)
# A character with the Unicode property “mark” (a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.))
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"

See it.

关于php - 如何在 Php 中将多字节字符串拆分为单词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8422189/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com