python - 为替换 split () 而构建的函数的意外行为-6ren

python - 为替换 split () 而构建的函数的意外行为

转载作者：太空宇宙更新时间：2023-11-04 07:03:17

24

4

我写了一个比内置函数 split() 表现更好的函数(我知道它不是惯用的 python，但我尽力了)，所以当我传递这个参数时:

better_split("After  the flood   ...  all the colors came out."," .")

我预料到这个结果:

['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

然而，令人惊讶的是，该函数会导致(对我而言)无法理解的行为。当它到达最后两个词时，它不会抑制更多的 ''，而不是将“cam”和“out”添加到结果列表中，而是将其添加到“came out”中，因此，我得到了这个:

['After', 'the', 'flood', 'all', 'the', 'colors', 'came out']

有更多经验的人明白为什么会这样吗？提前感谢您的帮助!

def better_split(text,markersString):
markers = []
splited = []
for e in markersString:
    markers.append(e)    
for character in text:
    if character in markers:
        point = text.find(character)
        if text[:point] not in character:
            word = text[:point]
            splited.append(word)            
            while text[point] in markers and point+1 < len(text):
                point = point + 1
            text = text[point:]                   
print 'final splited = ', splited

better_split("This is a test-of the,string separation-code!", ",!-")

better_split("洪水过后……所有的颜色都出来了。",".")

split() 多重分离如果您正在寻找具有多重分离的 split() ，请参阅: Split Strings with Multiple Delimiters?

我找到的没有 import re 的最佳答案是:

def my_split(s, seps):
    res = [s]
    for sep in seps:
        s, res = res, []
        for seq in s:
            res += seq.split(sep)
    return res

最佳答案

问题在于:

    for character in text:

正在遍历初始字符串中的字符 — text 的原始值 — 同时:

        point = text.find(character)

在当前字符串中搜索分隔符 — text 的当前值。因此，您的那部分功能是在您一次处理一个定界符的假设下运行的；也就是说，它假设每当您在 original text 的循环中遇到定界符时，它就是 current< 中的第一个定界符/em> 文本。

同时，这:

while text[point] in markers and point+n < len(text): point = point + 1 text = text[point:]
用于一次删除多个分隔符；它的目标是删除一系列连续的定界符。这违反了上述代码一次只处理一个定界符的假设。
所以处理过程是这样的:

[After the flood ... all the colors came out.] handling first space after "After": [After] [the flood ... all the colors came out.] handling second space after "After": [After] [the] [flood ... all the colors came out.] handling space after "the": [After] [the] [flood] [all the colors came out.] handling first space after "flood": [After] [the] [flood] [all] [the colors came out.] handling second space after "flood": [After] [the] [flood] [all] [the] [colors came out.] handling third space after "flood": [After] [the] [flood] [all] [the] [colors] [came out.] handling first period of the "...": [After] [the] [flood] [all] [the] [colors] [came out] [] -- text is now empty, no more splitting happens
如您所见，您处理的分隔符最终不会成为您拆分的分隔符。
解决方案只是删除让您一次跳过多个定界符的逻辑——也就是说，改变这个:

while text[point] in markers and point+n < len(text): point = point + 1 text = text[point:]
对此: text = text[(point + 1):]
相反，在将 word 添加到 splited 之前，确保它是非空的:

if len(word) > 0: splited.append(word)

关于python - 为替换 split () 而构建的函数的意外行为，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/9747059/

24

4

0

文章推荐： java - JTextField - 更改 focusLost 事件上的插入符位置

文章推荐： CSS 不适用于某些部分，但适用于它的 child

文章推荐： css - 如何在CSS之前显示半张图片

文章推荐： python - 获取文件大小的最佳方法是什么？

详解C语言sscanf()函数、vsscanf()函数、vscanf()函数
C语言sscanf()函数：从字符串中读取指定格式的数据头文件： ?
php - 如何解释at()函数； substr()函数;伪代码中的exist()函数
最近，我有一个关于工作预评估的问题，即使查询了每个功能的工作原理，我也不知道如何解决。这是一个伪代码。下面是一个名为foo()的函数，该函数将被传递一个值并返回一个值。如果将以下值传递给foo函数，
VBS教程：函数-CStr 函数
CStr 函数返回表达式，该表达式已被转换为 String 子类型的 Variant。 CStr(expression) expression 参数是任意有效的表达式。说明通常，可以
VBS教程：函数-CSng 函数
CSng 函数返回表达式，该表达式已被转换为 Single 子类型的 Variant。 CSng(expression) expression 参数是任意有效的表达式。说明通常，可
VBS教程：函数-CreateObject 函数
CreateObject 函数创建并返回对 Automation 对象的引用。 CreateObject(servername.typename [, location]) 参数 serv
VBS教程：函数-Cos 函数
Cos 函数返回某个角的余弦值。 Cos(number) number 参数可以是任何将某个角表示为弧度的有效数值表达式。说明 Cos 函数取某个角并返回直角三角形两边的比值。此比值是
VBS教程：函数-CLng 函数
CLng 函数返回表达式，此表达式已被转换为 Long 子类型的 Variant。 CLng(expression) expression 参数是任意有效的表达式。说明通常，您可以使
VBS教程：函数-CInt 函数
CInt 函数返回表达式，此表达式已被转换为 Integer 子类型的 Variant。 CInt(expression) expression 参数是任意有效的表达式。说明通常，可
VBS教程：函数-Chr 函数
Chr 函数返回与指定的 ANSI 字符代码相对应的字符。 Chr(charcode) charcode 参数是可以标识字符的数字。说明从 0 到 31 的数字表示标准的不可打印的
VBS教程：函数-CDbl 函数
CDbl 函数返回表达式，此表达式已被转换为 Double 子类型的 Variant。 CDbl(expression) expression 参数是任意有效的表达式。说明通常，您可
VBS教程：函数-CDate 函数
CDate 函数返回表达式，此表达式已被转换为 Date 子类型的 Variant。 CDate(date) date 参数是任意有效的日期表达式。说明 IsDate 函数用于判断 d
VBS教程：函数-CCur 函数
CCur 函数返回表达式，此表达式已被转换为 Currency 子类型的 Variant。 CCur(expression) expression 参数是任意有效的表达式。说明通常，
VBS教程：函数-CByte 函数
CByte 函数返回表达式，此表达式已被转换为 Byte 子类型的 Variant。 CByte(expression) expression 参数是任意有效的表达式。说明通常，可以
VBS教程：函数-CBool 函数
CBool 函数返回表达式，此表达式已转换为 Boolean 子类型的 Variant。 CBool(expression) expression 是任意有效的表达式。说明如果 ex
VBS教程：函数-Atn 函数
Atn 函数返回数值的反正切值。 Atn(number) number 参数可以是任意有效的数值表达式。说明 Atn 函数计算直角三角形两个边的比值 (number) 并返回对应角的弧
VBS教程：函数-Asc 函数
Asc 函数返回与字符串的第一个字母对应的 ANSI 字符代码。 Asc(string) string 参数是任意有效的字符串表达式。如果 string 参数未包含字符，则将发生运行时错误。
VBS教程：函数-Array 函数
Array 函数返回包含数组的 Variant。 Array(arglist) arglist 参数是赋给包含在 Variant 中的数组元素的值的列表（用逗号分隔）。如果没有指定此参数，则
VBS教程：函数-Abs 函数
Abs 函数返回数字的绝对值。 Abs(number) number 参数可以是任意有效的数值表达式。如果 number 包含 Null，则返回 Null；如果是未初始化变量，则返回 0。
VBS教程：函数-FormatPercent 函数
FormatPercent 函数返回表达式，此表达式已被格式化为尾随有 % 符号的百分比（乘以 100 ）。 FormatPercent(expression[,NumDigitsAfterD
VBS教程：函数-FormatNumber 函数
FormatNumber 函数返回表达式，此表达式已被格式化为数值。 FormatNumber( expression [,NumDigitsAfterDecimal [,Inc

首页

博学

6Ren·AI

商城

python - 为替换 split () 而构建的函数的意外行为