gpt4 book ai didi

regex - 正则表达式不区分大小写是否更慢?

转载 作者:行者123 更新时间:2023-12-03 20:28:24 26 4
gpt4 key购买 nike

Source

RegexOptions.IgnoreCase is more expensive than I would have thought (eg, should be barely measurable)



假设这适用于 PHP、Python、Perl、Ruby 等以及 C#(这是我假设 Jeff 使用的),那么速度会减慢多少,我是否会因 /[a-zA-z]/ 而受到类似的惩罚?和我一样 /[a-z]/i ?

最佳答案

是的,[A-Za-z] 会比设置 RegexOptions.IgnoreCase 快得多,主要是因为 Unicode 字符串。但它也有更多限制——[A-Za-z] 不匹配重音国际字符,它实际上是 A-Za-z ASCII 集,仅此而已。
我不知道你是否看到 Tim Bray 对我的消息的回答,但这是一个很好的回答:

One of the trickiest issues in internationalized search is upper and lower case. This notion of case is limited to languages written in the Latin, Greek, and Cyrillic character sets. English-speakers naturally expect search to be case-insensitive if only because they’re lazy: if Nadia Jones wants to look herself up on Google she’ll probably just type in nadia jones and expect the system to take care of it.

So it’s fairly common for search systems to “normalize” words by converting them all to lower- or upper-case, both for indexing and queries.

The trouble is that the mapping between cases is not always as straightforward as it is in English. For example, the German lower-case character “ß” becomes “SS” when upper-cased, and good old capital “I” when down-cased in Turkish becomes the dotless “ı” (yes, they have “i”, its upper-case version is “İ”). I have read (but not verified first-hand) that the rules for upcasing accented characters such “é” are different in France and Québec. One of the results of all this is that software such as java.String.toLowerCase() tends to run astonishingly slow as it tries to work around all these corner-cases.


http://www.tbray.org/ongoing/When/200x/2003/10/11/SearchI18n

关于regex - 正则表达式不区分大小写是否更慢?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32010/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com