gpt4 book ai didi

python - 为什么此 string.punctuation 代码不适用于剥离标点符号?

转载 作者:行者123 更新时间:2023-12-01 00:03:17 26 4
gpt4 key购买 nike

我很困惑为什么这段代码不能按我想要的方式工作。我正在读取 txt 文件并将每个项目(逗号分隔)打印到新行上。每个项目都用“”包围,并且还包含标点符号。我正在尝试删除这个标点符号。我熟悉 string.punctuation 并让它在我的示例中进行测试,但它在我循环的项目上失败,请参见下文:

def read_word_lists(path):
import string
with open(path, encoding='utf-8') as f:
lines = f.readlines()
for line in lines[0].split(','):
line = str(line)
line = line.strip().lower()
print(''.join(word.strip(string.punctuation) for word in line))
print(line)
print(''.join(word.strip(string.punctuation) for word in '"why, does this work?! and not above?"'))


read_word_lists('file.txt')

结果是这样的:

trying to strip punctuation:  “you never”
originial: “you never”
test: why does this work and not above
trying to strip punctuation: “you always
originial: “you always"
test: why does this work and not above
trying to strip punctuation: ” “your problem is”
originial: ” “your problem is”
test: why does this work and not above
trying to strip punctuation: “the trouble with you is”
originial: “the trouble with you is”
test: why does this work and not above

有什么想法为什么“尝试删除标点符号”输出不起作用?

编辑

如果有用的话,原始文件如下所示:

“你永远不会”,“你总是”,”“你的问题是”,“你的麻烦是”

最佳答案

您正在尝试删除 unicode 标点符号,而 string.punctuation 仅包含 ascii 标点符号。

您可以使用下面的代码来生成包含所有 Unicode 标点符号的字符串,而不是使用 string.punctuation:

import unicodedata
import sys

punctuation = "".join((chr(i) for i in range(sys.maxunicode) if unicodedata.category(chr(i)).startswith('P')))

祝你好运!

关于python - 为什么此 string.punctuation 代码不适用于剥离标点符号?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60173745/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com