gpt4 book ai didi

python - 正则表达式无法排除带有换行符的匹配项

转载 作者:行者123 更新时间:2023-12-05 07:00:10 25 4
gpt4 key购买 nike

我运行以下正则表达式。为了更清楚起见,我正在使用变量对其进行分解:

all_no_numb_newline = r'(?:[^\n\d]*\n)' ## I include an extra line just to get more context ##
all_no_numb = r'(?:[^\n\d]*)' ## I do not want there to be any numbers on the same line except the ID ##
x1 = r'(?!(1-888-555|\(888\)))' ## I am excluding a specific common phone number ##
x2 = r'(?![\n\/])\W{0,2}' ## I am excluding line breaks and date formats ##
id_re = f'({x1}\d(?:{x2}\d){{16}}\d)' ## This is an ID number 18 digits long with some symbols in between ##

基本上,我正在尝试识别一个 18 位长的 ID。 我不想匹配包含任何字母、换行符或正斜杠的 18 位数字。如果我将 18 位 ID 与其他随机符号进行匹配,那很好。我也不想匹配前面带有任何数字的 ID。我还想匹配到主要组之前的额外行,以便更好地了解我的匹配,但是我真的在 id_re 匹配之后(这就是为什么我在 all_nno_numb_new_line 旁边放了一个问号“?”)。

然后我使用以下代码运行:

re.findall(
"("+
all_no_numb_newline+"?"+
all_no_numb+
id_re+")"
, text)[0]

但是,这仍然返回以下匹配项:

('L1 (061510)\n1009671-1000', '1 (061510)\n1009671-1000', '')

我希望没有换行符,我希望有两个组(我的一般匹配和我的 ID 组)。为什么有 3 个组而不是 2 个?为什么匹配中会出现“\n”,即换行符?

编辑:匹配示例

'Mortgage\nID 756953480812037780'
')\n*DT756953480812037780'
'\nq75695348081 0233 240'
')\n*DT756953480812037780'
'\nq03313375233 0233 329'
'ID 676170114397739293'
'ID NUMBER 676170114397739293'
'ID\n676170114397739293'
'ID676170114397739293'

OUTPUT:

'756953480812037780'
'756953480812037780'
'75695348081 0233 240'
'756953480812037780'
'03313375233 0233 329'
'676170114397739293'
'676170114397739293'
'676170114397739293'
'676170114397739293'

编辑:不应匹配的示例

'L1 (061510)\n1009671-1000'
'L1 081510)\n1009671-1000'
'L1 (061510)\n1009671-1000'

最佳答案

使用

(?<!\d)\d(?:\s*\d){16}\d(?!\d)

参见 proof .

解释

--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
(?: group, but do not capture (16 times):
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
){16} end of grouping
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of look-ahead

关于python - 正则表达式无法排除带有换行符的匹配项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64199179/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com