gpt4 book ai didi

python - 正则表达式 : Finding alpha numeirc sub-strings but not strictly alphabetic and might be fully numeric

转载 作者:行者123 更新时间:2023-12-01 07:11:01 27 4
gpt4 key购买 nike

注意:这个问题可能几乎类似于 this问题,但这些解决方案都不适合我。

我有一个字符串列表 cust_order_no = [' ORDER 4509910882', ' 4509910882'] 。我想找到此列表的每个字符串中的所有字母数字子字符串,其中子字符串本质上必须包含至少一个数字字符,并且最多所有字符都是数字类型,即拒绝完全字母的子字符串。如果示例文本是 "Order n0. AA1uu67756" ,那么我想要的结果将是 ["n0.", "AA1uu67756"] 。没有正则表达式,我可以这样做:

poss_cust_nums = [[j for j in i.split() if j.isalnum() and not j.isalpha()] for i in cust_order_no]

这给了我正确的期望输出:

[['4509910882'], ['4509910882']]

我想对 cust_order_no 中存在的所有字符串执行此操作用正则表达式。从链接的问题中,我尝试了这些:

>>> p1 = r"/^(?=.*\d)[a-z\d]*$/i"
>>> [re.findall(p1, i) for i in cust_order_no]
[[], []]
>>> p2 = r"/^([0-9]|([0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+)[0-9a-zA-Z]*)$/;"
>>> [re.findall(p2, i) for i in cust_order_no]
[[], []]
>>> p3 = r"^([a-zA-Z+]+[0-9+]+)|([0-9+]+[a-zA-Z+]+)$"
>>> [re.findall(p3, i) for i in cust_order_no]
[[], []]
>>> p4 = r"/^([0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+)[0-9a-zA-Z]*$/"
>>> [re.findall(p4, i) for i in cust_order_no]
[[], []]

我也尝试了这些正则表达式选项,但没有一个起作用:

>>> [re.findall(r'[a-zA-Z].?[0-9]+', i) for i in cust_order_no]
[['R 4509910882'], []]
>>> [re.findall(r'[a-zA-Z]\.?[0-9]+', i) for i in cust_order_no]
[[], []]
>>> [re.findall(r'[a-zA-Z]\.?[0-9]', i) for i in cust_order_no]
[[], []]

对于这种类型的搜索,正确的正则表达式模式是什么?

示例输入 1:

[' NOI Code:50010 by 49 CFR 4509910882 PER DUPONT']

示例输出 1:

[['50010', '49', '4509910882']]

示例输入 2:

[' (SID) number must be shown on all f bills and ', 'correspondence 7800275358']

示例输出 2:

[[], ['7800275358']]

最佳答案

这个正则表达式应该做你想要的。它查找断词,然后使用正向前视检查是否有 0 个或多个字母字符后跟数字,然后捕获直到下一个断词的字符:

\b(?=[a-zA-Z]*\d)[A-Za-z0-9]+\b

Demo on regex101

在Python中

import re
pattern = re.compile('\\b(?=[a-zA-Z]*\\d)[A-Za-z0-9]+\\b')
str = [' NOI Code:50010 by 49 CFR 4509910882 PER DUPONT']
print ([pattern.findall(i) for i in str])
str = [' (SID) number must be shown on all f bills and ', 'correspondence 7800275358']
print ([pattern.findall(i) for i in str])
str = [' ORDER 4509910882', ' 4509910882']
print ([pattern.findall(i) for i in str])
str = 'Order n0. AA1uu67756'
print (pattern.findall(str))

输出

[['50010', '49', '4509910882']]
[[], ['7800275358']]
[['4509910882'], ['4509910882']]
['n0', 'AA1uu67756']

关于python - 正则表达式 : Finding alpha numeirc sub-strings but not strictly alphabetic and might be fully numeric,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58220876/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com