gpt4 book ai didi

regex - 提取带有前缀的姓氏的正则表达式

转载 作者:行者123 更新时间:2023-12-04 15:30:26 24 4
gpt4 key购买 nike

有没有一种方法可以使用正则表达式或其他逻辑从字符串中提取名称的一部分。

我想用空格分割名字,但是如果名字有前缀,我想在前缀上分割,例如

Osama bin Laden bin Mohammed => Osama, bin Laden, bin Mohamed
Jorge do Pinto da Silva => Jorge, do Pinto, da Silva
John Andrew Smith => John, Andrew, Smith
José Mário dos Santos Mourinho Félix => José, Mário, dos Santos, Mourinho, Félix

基于 Tim 的建议的工作代码:

$str = 'Manuel D\'Souza do Pinto bin Laden Al-saud el Mecca de la Vere Na Sokakah van Der Reidejin del Monte du Pont ter Johannes';
preg_match_all( '~\b(von der|van de|van den|del la|de la|van der|vande|vanden|vander|st|der|des|dela|della|bin|dos|ur|ibn|bint|da|do|le|la|del|du|de|di|el|al|van|von|ter|na|del|san|los)\s+[^\s]+\b|\b[^\s]+~i', $str, $mat );
print_r( $mat );

结果:

Array(
[0] => Array
(
[0] => Manuel
[1] => D'Souza
[2] => do Pinto
[3] => bin Laden
[4] => Al-saud
[5] => el Mecca
[6] => de la Vere
[7] => Na Sokakah
[8] => van Der Reidejin
[9] => del Monte
[10] => du Pont
[11] => ter Johannes
)

[1] => Array
(
[0] =>
[1] =>
[2] => do
[3] => bin
[4] =>
[5] => el
[6] => de la
[7] => Na
[8] => van Der
[9] => del
[10] => du
[11] => ter
)

)

最佳答案

牢记所有这些 falsehoods programmers believe about names , 你还是可以试试

\b\p{Lu}\p{Ll}*|\b\p{Ll}+\s+\p{Lu}\p{Ll}*

将匹配大写单词(名称)或小写前缀,后跟大写单词。

查看live on regex101.com .

解释:

\b      # Start of word
\p{Lu} # One uppercase letter
\p{Ll}* # Any number of lowercase letters
| # or
\b # Start of word
\p{Ll}+ # One or more lowercase letters
\s+ # Whitespace
\p{Lu} # One uppercase letter
\p{Ll}* # Any number of lowercase letters

关于regex - 提取带有前缀的姓氏的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23610277/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com