gpt4 book ai didi

python - 在链接中查找日期实例;正则表达式; Python

转载 作者:太空宇宙 更新时间:2023-11-04 10:15:21 25 4
gpt4 key购买 nike

我有一个巨大的链接列表,大致遵循下一个结构:

http://www.website.com/2016/2/25/11118290/story
http://www.website.com/authors/author
http://www.website.com/2016/1/25/11118290/story
http://www.website.com/authors/author
http://www.website.com/2015/12/15/11118290/story
http://www.website.com/authors/author
http://www.website.com/2010/01/01/11118290/story
http://www.website.com/authors/author

我只需要获取包含日期的链接,即:

http://www.website.com/YYYY/MM/DD/11118290/story

但日期也可以是 YYYY/M/DYYYY/MM/DYYYY/M/DD

我无法弄清楚什么正则表达式只提取带有日期的链接,但日期格式略有可变。

最佳答案

使用标准库 (dateutil) 似乎对我有用:

test_set = [
'http://www.website.com/2016/2/25/11118290/story',
'http://www.website.com/authors/author',
'http://www.website.com/2016/1/25/11118290/story',
'http://www.website.com/authors/author',
'http://www.website.com/2015/12/15/11118290/story',
'http://www.website.com/authors/author',
'http://www.website.com/2010/01/01/11118290/story',
'http://www.website.com/2010/1/1/11118290/story',
'http://www.website.com/2010/01/1/11118290/story',
'http://www.website.com/authors/author',
]

from dateutil.parser import parse
for lnk in test_set:
dt = lnk.replace("http://www.website.com/","").split("/")
dt_str = "-".join(dt[:3])
try:
parse(dt_str)
print("Date: %s" % lnk)
except ValueError:
print("Not a date: %s" % lnk)


Date: http://www.website.com/2016/2/25/11118290/story
Not a date: http://www.website.com/authors/author
Date: http://www.website.com/2016/1/25/11118290/story
Not a date: http://www.website.com/authors/author
Date: http://www.website.com/2015/12/15/11118290/story
Not a date: http://www.website.com/authors/author
Date: http://www.website.com/2010/01/01/11118290/story
Date: http://www.website.com/2010/1/1/11118290/story
Date: http://www.website.com/2010/01/1/11118290/story
Not a date: http://www.website.com/authors/author

关于python - 在链接中查找日期实例;正则表达式; Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35698326/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com