gpt4 book ai didi

python - 基于pyparsing的拆分

转载 作者:行者123 更新时间:2023-11-28 22:12:30 24 4
gpt4 key购买 nike

所以我想这样做(但使用 pyparsing)

Package:numpy11 Package:scipy
will be split into
[["Package:", "numpy11"], ["Package:", "scipy"]]

到目前为止我的代码是

package_header = Literal("Package:")
single_package = Word(printables + " ") + ~Literal("Package:")
full_parser = OneOrMore( pp.Group( package_header + single_package ) )

当前输出是这样的

([(['Package:', 'numpy11 Package:scipy'], {})], {})

我希望有这样的东西

([(['Package:', 'numpy11'], {})], [(['Package:', 'scipy'], {})], {})

基本上其余的文本匹配 pp.printables

我知道我可以使用 Words 但我想这样做

all printables but not the Literal

我该如何实现?谢谢。

最佳答案

你不应该需要消极的前瞻,即。这个:

from pyparsing import *

package_header = Literal("Package:")
single_package = Word(printables)
full_parser = OneOrMore( Group( package_header + single_package ) )

print full_parser.parseString("Package:numpy11 Package:scipy")

打印:

[['Package:', 'numpy11'], ['Package:', 'scipy']]

更新:要解析由 | 分隔的包,您可以使用 delimitedList() 函数(现在您也可以在包名称中包含空格):

from pyparsing import *

package_header = Literal("Package:")
package_name = Regex(r'[^|]+') # | is a printable, so create a regex that excludes it.
package = Group(package_header + package_name)
full_parser = delimitedList(package, delim="|" )

print full_parser.parseString("Package:numpy11 foo|Package:scipy")

打印:

[['Package:', 'numpy11 foo'], ['Package:', 'scipy']]

关于python - 基于pyparsing的拆分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54795385/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com