gpt4 book ai didi

regex - 查找符号集前面的单词

转载 作者:行者123 更新时间:2023-11-28 13:00:55 25 4
gpt4 key购买 nike

如何找到 [¹²³⁴⁵⁶⁷⁸⁹⁰] 之前的单词。例如:

let myString = "Regular expressions¹ consist of constants, ² and operator symbols...³"

请提供一个模式来选择从目标词开头到上标的字符:

"expressions¹", "constants, ²", "symbols...³"

& pattern 只选择目标词

"expressions", "constants", "symbols"

最佳答案

这将匹配您的示例。

代码点:

\b\w+\W*[\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+

来自维基百科:

The most common superscript digits (1, 2, and 3) were in ISO-8859-1 and were therefore carried over into those positions in the Latin-1 range of Unicode. The rest were placed in a dedicated section of Unicode at U+2070 to U+209F.

更新:

要获得以单词或非单词开头的单独 block ,您可以
从非词类中排除上标范围。
正则表达式更长且更冗余,但它有效。

(?:\b\w+[^\w\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{ 2077}\x{2078}\x{2079}\x{2070}]*|[^\w\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+)[\x{B9}\x{B2}\x{B3}\x{2074}\x {2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+

格式化

 (?:
\b
# Required - Words
\w+
# Optional - Not words, nor supersctipt
[^\w\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]*

| # or,

# Required - Not words, nor supersctipt
[^\w\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+
)
# Required - Superscript
[\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+

关于regex - 查找符号集前面的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33875363/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com