gpt4 book ai didi

python - 如何让正则表达式返回字符串(而不是正则表达式对象)?

转载 作者:太空宇宙 更新时间:2023-11-03 18:19:36 25 4
gpt4 key购买 nike

我已经阅读了正则表达式文档,但是对于像我这样的初学者程序员来说,它非常令人困惑。所以我最后的办法就是在这里发帖。

# Tivo Notifier
import os, re

WATCH_DIR = "D:/tivo"
TO_FIND = [".*big.brother.uk.s15.*", ".*mock.the.week.*", ".*family.guy.*"]

# open history log file
history = open("history.txt", "w+")

# get downloaded files
files = os.listdir(WATCH_DIR)

# compare each file to regex patterns
for pattern in TO_FIND:
regex = re.compile(pattern)
match = [m.group(0) for file in files for m in [regex.search(file)] if m]

for filename in match:
if filename not in history.read(): # if a new match is found
print "new:", filename # display new match file name
history.write(filename) # add file name to history file
history.close()

这里的问题是它向历史文件写入了大量垃圾: http://pastebin.com/3C5iVbU7

我假设这是因为 filename 不是字符串,可能是一种正则表达式对象。我在文档中看不到如何返回字符串。

我只想将文件名添加到历史文件中,而不是从此脚本实际添加的垃圾文本。

有人可以告诉我该怎么做吗?

最佳答案

这是一种更直接的方法,使用 glob而不是正则表达式。它还使用 sets维护历史记录和新文件。

import os, glob

WATCH_DIR = 'D:/tivo'
TO_FIND = ['*big.brother.uk.s15*', '*mock.the.week*', '*family.guy*']

history = set(open('history.txt').read().splitlines())

new_files = set()
for pattern in TO_FIND:
files = glob.glob(os.path.join(WATCH_DIR, pattern))
# optionally strip directories from file names
files = [os.path.basename(f) for f in files]
new_files.update(files)

new_files = new_files.difference(history)
for f in sorted(new_files):
print "new: %s" % f

history.update(new_files)
open('history.txt', 'w').write('%s\n' % '\n'.join(sorted(history)))

关于python - 如何让正则表达式返回字符串(而不是正则表达式对象)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24376363/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com