gpt4 book ai didi

Python - 在字符串中查找日期

转载 作者:太空狗 更新时间:2023-10-29 18:20:44 25 4
gpt4 key购买 nike

我希望能够读取一个字符串并返回其中出现的第一个日期。有现成的模块可以使用吗?我试图为所有可能的日期格式编写正则表达式,但它很长。有更好的方法吗?

最佳答案

您可以对文本的所有子文本运行日期解析器并选择第一个日期。当然,这样的解决方案要么捕获非日期的内容,要么不捕获日期的内容,或者很可能两者兼而有之。

让我提供一个使用 dateutil.parser 的示例捕捉任何看起来像约会的东西:

import dateutil.parser
from itertools import chain
import re

# Add more strings that confuse the parser in the list
UNINTERESTING = set(chain(dateutil.parser.parserinfo.JUMP,
dateutil.parser.parserinfo.PERTAIN,
['a']))

def _get_date(tokens):
for end in xrange(len(tokens), 0, -1):
region = tokens[:end]
if all(token.isspace() or token in UNINTERESTING
for token in region):
continue
text = ''.join(region)
try:
date = dateutil.parser.parse(text)
return end, date
except ValueError:
pass

def find_dates(text, max_tokens=50, allow_overlapping=False):
tokens = filter(None, re.split(r'(\S+|\W+)', text))
skip_dates_ending_before = 0
for start in xrange(len(tokens)):
region = tokens[start:start + max_tokens]
result = _get_date(region)
if result is not None:
end, date = result
if allow_overlapping or end > skip_dates_ending_before:
skip_dates_ending_before = end
yield date


test = """Adelaide was born in Finchley, North London on 12 May 1999. She was a
child during the Daleks' abduction and invasion of Earth in 2009.
On 1st July 2058, Bowie Base One became the first Human colony on Mars. It
was commanded by Captain Adelaide Brooke, and initially seemed to prove that
it was possible for Humans to live long term on Mars."""

print "With no overlapping:"
for date in find_dates(test, allow_overlapping=False):
print date


print "With overlapping:"
for date in find_dates(test, allow_overlapping=True):
print date

无论您是否允许重叠,代码的结果都是垃圾,这并不奇怪。如果允许重叠,你会得到很多无处可见的日期,如果不允许重叠,你会错过文本中的重要日期。

With no overlapping:
1999-05-12 00:00:00
2009-07-01 20:58:00
With overlapping:
1999-05-12 00:00:00
1999-05-12 00:00:00
1999-05-12 00:00:00
1999-05-12 00:00:00
1999-05-03 00:00:00
1999-05-03 00:00:00
1999-07-03 00:00:00
1999-07-03 00:00:00
2009-07-01 20:58:00
2009-07-01 20:58:00
2058-07-01 00:00:00
2058-07-01 00:00:00
2058-07-01 00:00:00
2058-07-01 00:00:00
2058-07-03 00:00:00
2058-07-03 00:00:00
2058-07-03 00:00:00
2058-07-03 00:00:00

本质上,如果允许重叠:

  1. “12 May 1999”被解析为 1999-05-12 00:00:00
  2. “1999 年 5 月”被解析为 1999-05-03 00:00:00(因为今天是该月的第 3 天)

但是,如果不允许重叠,“2009. 2058 年 7 月 1 日”将被解析为 2009-07-01 20:58:00,并且不会尝试解析句点之后的日期。

关于Python - 在字符串中查找日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6562148/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com