gpt4 book ai didi

python - 如何从 Python 列表中删除日期

转载 作者:太空狗 更新时间:2023-10-29 21:20:40 25 4
gpt4 key购买 nike

我有一个标记化文本列表 (list_of_words),看起来像这样:

list_of_words = 
['08/20/2014',
'10:04:27',
'pm',
'complet',
'vendor',
'per',
'mfg/recommend',
'08/20/2014',
'10:04:27',
'pm',
'complet',
...]

我正试图从此列表中删除所有日期和时间实例。我试过使用 .remove() 函数,但无济于事。我试过将通配符(例如“../../....”)传递给我用来排序的停用词列表,但这没有用。我最终尝试编写以下代码:

for line in list_of_words:
if re.search('[0-9]{2}/[09]{2}/[0-9]{4}',line):
list_of_words.remove(line)

但这也行不通。如何从我的列表中删除日期或时间等格式的所有内容?

最佳答案

描述

^(?:(?:[0-9]{2}[:\/,]){2}[0-9]{2,4}|am|pm)$

Regular expression visualization

此正则表达式将执行以下操作:

  • 查找类似于日期 12/23/2016 和时间 12:34:56
  • 的字符串
  • 查找同样是 ampm 的字符串,它们可能是源列表中之前时间的一部分

例子

现场演示

示例列表

08/20/2014
10:04:27
pm
complete
vendor
per
mfg/recommend
08/20/2014
10:04:27
pm
complete

处理后列表

complete
vendor
per
mfg/recommend
complete

示例 Python 脚本

import re

SourceList = ['08/20/2014',
'10:04:27',
'pm',
'complete',
'vendor',
'per',
'mfg/recommend',
'08/20/2014',
'10:04:27',
'pm',
'complete']

OutputList = filter(
lambda ThisWord: not re.match('^(?:(?:[0-9]{2}[:\/,]){2}[0-9]{2,4}|am|pm)$', ThisWord),
SourceList)


for ThisValue in OutputList:
print ThisValue

说明

NODE                     EXPLANATION
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
(?: group, but do not capture (2 times):
----------------------------------------------------------------------
[0-9]{2} any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
[:\/,] any character of: ':', '\/', ','
----------------------------------------------------------------------
){2} end of grouping
----------------------------------------------------------------------
[0-9]{2,4} any character of: '0' to '9' (between 2
and 4 times (matching the most amount
possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
am 'am'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
pm 'pm'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------

关于python - 如何从 Python 列表中删除日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37473219/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com