gpt4 book ai didi

python-2.7 - 从字符串中排除标点符号/非字母字符的更好方法?

转载 作者:行者123 更新时间:2023-12-01 00:49:17 30 4
gpt4 key购买 nike

我编写了这个程序来对文本文档中的单词进行分类和枚举。如果我不必用 string.translate() 挑出每一个可能的标点符号,代码会非常整洁......是否有可能,而不是排除特定情况,只允许字母顺序(也许还有数字)字符?

from sys import argv

script_, filename = argv

bang = open(filename, 'r+')
words = bang.read()
words = words.translate(None, ',')
words = words.translate(None, '"')
words = words.translate(None, '.')
words = words.translate(None, '...')
words = words.translate(None, '?')
words = words.translate(None, '!')
words = words.translate(None, ';')
words = words.translate(None, '-')
words = words.translate(None, '\'')
words = words.translate(None, '.\'')
words = words.translate(None, '(')
words = words.translate(None, ')')
words = words.translate(None, ':')
words = str(words)
words = words.lower()
liste = words.split()
sorte = sorted(liste)

i = 0
f = 'nullooosdfgkjlkjasdihaiwuehlfkj898'
z = 1
w = 0

for wordss in sorte:
if f == wordss:
z += 1
w += 1
elif f != wordss:
w += 1
print "-", z
z = 1
i += 1
print "%d. %s" % (i, wordss),
f = wordss

print "\n\n word count - %d\n" % w

最佳答案

I want to list words in a text document

这个算法怎么样。在空白处拆分文本,然后去除标点符号。

>>> text = "'I wonder how many miles I've fallen by this time?' she said aloud."
>>> import string
>>> words = [x.strip(string.punctuation) for x in text.split()]
>>> words
['I', 'wonder', 'how', 'many', 'miles', "I've", 'fallen', 'by', 'this', 'time', 'she', 'said', 'aloud']

请参阅这与 don't 等缩略词对应。所以你可以区分 we'rewere

关于python-2.7 - 从字符串中排除标点符号/非字母字符的更好方法?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16724979/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com