gpt4 book ai didi

python - 在python中通过正则表达式提取多种日期格式

转载 作者:太空宇宙 更新时间:2023-11-03 15:35:35 24 4
gpt4 key购买 nike

我正在尝试从 python 中的文本中提取日期。这些是其中可能的文本和日期模式。

"Auction details: 14 December 2016, Pukekohe Park"
"Auction details: 17 Feb 2017, Gold Sacs Road"
"Auction details: Wednesday 27 Apr 1:00 p.m. (On site)(2016)"
"Auction details: Wednesday 27 Apr 1:00 p.m. (In Rooms - 923 Whangaa Rd, Man)(2016)"
"Auction details: Wed 27 Apr 2:00 p.m., 48 Viaduct Harbour Ave, Auckland, (2016)"
"Auction details: November 16 Wednesday 2:00pm at 48 Viaduct Harbour Ave, Auckland(2016)"
"Auction details: Thursday, 28th February '19"
"Auction details: Friday, 1st February '19"

这是我到目前为止写的,

mon = ' (?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?) '
day1 = r'\d{1,2}'
day_test = r'\d{1,2}(?:th)|\d{1,2}(?:st)'
year1 = r'\d{4}'
year2 = r'\(\d{4}\)'
dummy = r'.*'

这捕获了案例 1,2。

match = re.search(day1 + mon + year1, "Auction details: 14 December 2016, Pukekohe Park")
print match.group()

这在某种程度上捕捉了案例 3​​、4、5。但它打印了文本中的所有内容,所以在下面的例子中,我想要 2016 年 11 月 25 日,但下面的正则表达式模式给了我 11 月 25 日下午 3:00。 (现场)(2016 年)。

所以问题 1:如何只获取此处的日期?

match = re.search(day1 + mon + dummy + year2, "Friday 25 Nov 3:00 p.m. (On Site)(2016)")
print match.group()

问题2:同理,如何捕获case 6,7和8??正则表达式应该是什么?

如果没有,是否有其他更好的方法从这些格式中捕获日期?

最佳答案

你可以试试

((?:(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\s+\d{1,2}(?:st|nd|rd|th)?|\d{1,2}(?:st|nd|rd|th)?\s+(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)))(?:.*(\b\d{2}(?:\d{2})?\b))?

参见 regex demo .

请注意,我将正则表达式 block 中的所有组设为非捕获 ((Nov|Dec) -> (?:Nov|Dec)),添加了 (?:st|nd|rd|th)? 日数字模式后的可选组,将年份匹配模式更改为 \b\d{2}(?:\d{2})?\b 以便它只匹配 4 位或 2 位数字 block 作为整个单词,并创建一个交替组来说明日期在月之前出现,反之亦然。

日期和月份被捕获到第 1 组中,而年份被捕获到第 2 组中,因此结果是两者的串联。

注意:如果您需要以更安全的方式匹配年份,您可能需要精确的年份模式。例如,如果您想避免匹配 : 之后的 4 位或 2 位数字的整个单词,请添加一个负向回顾:

year1 = r'\b(?<!:)\d{2}(?:\d{2})?\b'
^^^^^^

此外,您可以在整个模式周围添加单词边界以确保整个单词匹配。

这是 Python demo :

import re
mon = r'(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)'
day1 = r'\d{1,2}(?:st|nd|rd|th)?'
year1 = r'\b\d{2}(?:\d{2})?\b'
dummy = r'.*'

rx = r"((?:{smon}\s+{sday1}|{sday1}\s+{smon}))(?:{sdummy}({syear1}))?".format(smon=mon, sday1=day1, sdummy=dummy, syear1=year1)
# Or, try this if a partial number before a date is parsed as day:
# rx = r"\b((?:{smon}\s+{sday1}|{sday1}\s+{smon}))(?:{sdummy}({syear1}))?".format(smon=mon, sday1=day1, sdummy=dummy, syear1=year1)
strs = ["Auction details: 14 December 2016, Pukekohe Park","Auction details: 17 Feb 2017, Gold Sacs Road","Auction details: Wednesday 27 Apr 1:00 p.m. (On site)(2016)","Auction details: Wednesday 27 Apr 1:00 p.m. (In Rooms - 923 Whangaa Rd, Man)(2016)","Auction details: Wed 27 Apr 2:00 p.m., 48 Viaduct Harbour Ave, Auckland, (2016)","Auction details: November 16 Wednesday 2:00pm at 48 Viaduct Harbour Ave, Auckland(2016)","Auction details: Thursday, 28th February '19","Auction details: Friday, 1st February '19","Friday 25 Nov 3:00 p.m. (On Site)(2016)"]
for s in strs:
print(s)
m = re.search(rx, s)
if m:
print("{} {}".format(m.group(1), m.group(2)))
else:
print("NO MATCH")

输出:

Auction details: 14 December 2016, Pukekohe Park
14 December 2016
Auction details: 17 Feb 2017, Gold Sacs Road
17 Feb 2017
Auction details: Wednesday 27 Apr 1:00 p.m. (On site)(2016)
27 Apr 2016
Auction details: Wednesday 27 Apr 1:00 p.m. (In Rooms - 923 Whangaa Rd, Man)(2016)
27 Apr 2016
Auction details: Wed 27 Apr 2:00 p.m., 48 Viaduct Harbour Ave, Auckland, (2016)
27 Apr 2016
Auction details: November 16 Wednesday 2:00pm at 48 Viaduct Harbour Ave, Auckland(2016)
November 16 2016
Auction details: Thursday, 28th February '19
28th February 19
Auction details: Friday, 1st February '19
1st February 19
Friday 25 Nov 3:00 p.m. (On Site)(2016)
25 Nov 2016

关于python - 在python中通过正则表达式提取多种日期格式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55017645/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com