gpt4 book ai didi

python - 为替换 split () 而构建的函数的意外行为

转载 作者:太空宇宙 更新时间:2023-11-04 07:03:17 24 4
gpt4 key购买 nike

我写了一个比内置函数 split() 表现更好的函数(我知道它不是惯用的 python,但我尽力了),所以当我传递这个参数时:

better_split("After  the flood   ...  all the colors came out."," .")

我预料到这个结果:

['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

然而,令人惊讶的是,该函数会导致(对我而言)无法理解的行为。当它到达最后两个词时,它不会抑制更多的 '',而不是将“cam”和“out”添加到结果列表中,而是将其添加到“came out”中,因此,我得到了这个:

['After', 'the', 'flood', 'all', 'the', 'colors', 'came out']

有更多经验的人明白为什么会这样吗?提前感谢您的帮助!

def better_split(text,markersString):
markers = []
splited = []
for e in markersString:
markers.append(e)
for character in text:
if character in markers:
point = text.find(character)
if text[:point] not in character:
word = text[:point]
splited.append(word)
while text[point] in markers and point+1 < len(text):
point = point + 1
text = text[point:]
print 'final splited = ', splited

better_split("This is a test-of the,string separation-code!", ",!-")

better_split("洪水过后……所有的颜色都出来了。",".")

split() 多重分离如果您正在寻找具有多重分离的 split() ,请参阅: Split Strings with Multiple Delimiters?

我找到的没有 import re 的最佳答案是:

def my_split(s, seps):
res = [s]
for sep in seps:
s, res = res, []
for seq in s:
res += seq.split(sep)
return res

最佳答案

问题在于:

    for character in text:

正在遍历初始字符串中的字符 — text原始值 — 同时:

        point = text.find(character)

在当前字符串中搜索分隔符 — text当前值。因此,您的那部分功能是在您一次处理一个定界符的假设下运行的;也就是说,它假设每当您在 original text 的循环中遇到定界符时,它就是 current< 中的第一个定界符/em> 文本

同时,这:

            while text[point] in markers and point+n < len(text):
point = point + 1
text = text[point:]

用于一次删除多个分隔符;它的目标是删除一系列连续的定界符。这违反了上述代码一次只处理一个定界符的假设。

所以处理过程是这样的:

  [After  the flood   ...  all the colors came out.]
handling first space after "After":
[After] [the flood ... all the colors came out.]
handling second space after "After":
[After] [the] [flood ... all the colors came out.]
handling space after "the":
[After] [the] [flood] [all the colors came out.]
handling first space after "flood":
[After] [the] [flood] [all] [the colors came out.]
handling second space after "flood":
[After] [the] [flood] [all] [the] [colors came out.]
handling third space after "flood":
[After] [the] [flood] [all] [the] [colors] [came out.]
handling first period of the "...":
[After] [the] [flood] [all] [the] [colors] [came out] []
-- text is now empty, no more splitting happens

如您所见,您处理的分隔符最终不会成为您拆分的分隔符。

解决方案只是删除让您一次跳过多个定界符的逻辑——也就是说,改变这个:

            while text[point] in markers and point+n < len(text):
point = point + 1
text = text[point:]

对此: text = text[(point + 1):]

相反,在将 word 添加到 splited 之前,确保它是非空的:

            if len(word) > 0:
splited.append(word)

关于python - 为替换 split () 而构建的函数的意外行为,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9747059/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com