gpt4 book ai didi

python - 在空格处拆分字符串,但不要删除它们

转载 作者:太空宇宙 更新时间:2023-11-03 14:15:23 25 4
gpt4 key购买 nike

<分区>

我想根据空格和标点符号拆分字符串,但空格和标点符号应该仍然在结果中。

例如:

Input: text = "This is a text; this is another   text.,."
Output: ['This', ' ', 'is', ' ', 'a', ' ', 'text', '; ', 'this', ' ', 'is', ' ', 'another', ' ', 'text', '.,.']

这是我目前正在做的事情:

def classify(b):
"""
Classify a character.
"""
separators = string.whitespace + string.punctuation
if (b in separators):
return "separator"
else:
return "letter"

def tokenize(text):
"""
Split strings to words, but do not remove white space.
The input must be of type str, not bytes
"""
if (len(text) == 0):
return []

current_word = "" + text[0]
previous_mode = classify(text)
offset = 1
results = []
while offset < len(text):
current_mode = classify(text[offset])
if current_mode == previous_mode:
current_word += text[offset]
else:
results.append(current_word)
current_word = text[offset]
previous_mode = current_mode
offset += 1

results.append(current_word)
return results

它可以工作,但它太像 C 风格了。在 Python 中有更好的方法吗?

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com