gpt4 book ai didi

python - 用于提取日期中月份和年份组合的正则表达式

转载 作者:行者123 更新时间:2023-12-05 05:35:41 36 4
gpt4 key购买 nike

我正在使用正则表达式提取文本中日期对的月份和年份:

regex = (
r"((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?(t)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)"
r"\s?[\.\s\’\’\,\/\'\,\‘\-\–\—]?\s?(\d{4}|\d{2})?\s?\s?((to)|[\|\-\–\—])\s?\s?"
r"((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?(t)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)"
r"\s?[\.\s\’\’\,\/\'\,\‘\-\–\—]?\s?(\d{4}|\d{2})|(Present|Now|till\s?(now|date|today)?|current)))"
)

当我使用一些包含月份日期的输入测试正则表达式时,在某些输入中包含月份中的某天,而在其他输入中不包含:

lst = [
'July 2014 - 28th August 2014',
'Jan 2012 - 3rd sep 2014',
'Jan 2008 - May 2012',
'Jan 2008 and May 2012'
]
for i in lst:
word = re.finditer(regex,i,re.IGNORECASE)
for match in word:
print(match.group())

我得到以下输出:

Jan 2008 - May 2012

但我的预期输出是:

July 2014 - August 2014
Jan 2012 - sep 2014
Jan 2008 - May 2012

我需要更改什么才能使正则表达式与日期中的可选日期相匹配?当日期字符串包含日期时,它始终是带有 stndrdth 的序数> 后缀。

最佳答案

您不能在单个匹配操作期间“跳过”字符串的一部分,因此如果您有 26th August,则您不能只匹配或捕获 26 August。在这些情况下,您要么需要捕获匹配的部分然后将它们连接起来,要么替换您不需要的部分作为后处理步骤。

所以,在这里,我将使用后处理替换方法

import re


day = r'(?:((?:0?[1-9]|[12]\d|3[01])(?:\s*(?:st|[rn]d|th))?)\s*)?'
month = r'(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Sep(?:t(?:ember)?)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)'
year = r'(\d{2}(?:\d{2})?)'
rx_valid = re.compile( fr'\b{day}{month}\s*{year}\s*[-—–]\s*{day}{month}\s*{year}(?!\d)', re.IGNORECASE )
rx_ordinal = re.compile( r'\s*\d+\s*(?:st|[rn]d|th)', re.IGNORECASE )

lst = [
'July 2014 - 28th August 2014',
'Jan 2012 - 3rd sep 2014',
'Jan 2008 - May 2012',
'Jan 2008 and May 2012'
]
for i in lst:
word = rx_valid.finditer(i)
for match in word:
print(rx_ordinal.sub("", match.group()))

输出:

July 2014 - August 2014
Jan 2012 - sep 2014
Jan 2008 - May 2012

参见 Python demoregex demo .

关于python - 用于提取日期中月份和年份组合的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73443644/

36 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com