gpt4 book ai didi

python - 将文本文件拆分为带有特殊分隔符行的部分 - python

转载 作者:太空狗 更新时间:2023-10-30 02:45:17 24 4
gpt4 key购买 nike

我有一个这样的输入文件:

This is a text block start
This is the end

And this is another
with more than one line
and another line.

期望的任务是按由一些特殊行分隔的部分读取文件,在这种情况下它是一个空行,例如[出]:

[['This is a text block start', 'This is the end'],
['And this is another','with more than one line', 'and another line.']]

我通过这样做得到了想要的输出:

def per_section(it):
""" Read a file and yield sections using empty line as delimiter """
section = []
for line in it:
if line.strip('\n'):
section.append(line)
else:
yield ''.join(section)
section = []
# yield any remaining lines as a section too
if section:
yield ''.join(section)

但是如果特殊行是以#开头的行 e.g.:

# Some comments, maybe the title of the following section
This is a text block start
This is the end
# Some other comments and also the title
And this is another
with more than one line
and another line.

我必须这样做:

def per_section(it):
""" Read a file and yield sections using empty line as delimiter """
section = []
for line in it:
if line[0] != "#":
section.append(line)
else:
yield ''.join(section)
section = []
# yield any remaining lines as a section too
if section:
yield ''.join(section)

如果我允许 per_section() 有一个分隔符参数,我可以试试这个:

def per_section(it, delimiter== '\n'):
""" Read a file and yield sections using empty line as delimiter """
section = []
for line in it:
if line.strip('\n') and delimiter == '\n':
section.append(line)
elif delimiter= '\#' and line[0] != "#":
section.append(line)
else:
yield ''.join(section)
section = []
# yield any remaining lines as a section too
if section:
yield ''.join(section)

但是有没有办法让我不对所有可能的分隔符进行硬编码?

最佳答案

传递谓词怎么样?

def per_section(it, is_delimiter=lambda x: x.isspace()):
ret = []
for line in it:
if is_delimiter(line):
if ret:
yield ret # OR ''.join(ret)
ret = []
else:
ret.append(line.rstrip()) # OR ret.append(line)
if ret:
yield ret

用法:

with open('/path/to/file.txt') as f:
sections = list(per_section(f)) # default delimiter

with open('/path/to/file.txt.txt') as f:
sections = list(per_section(f, lambda line: line.startswith('#'))) # comment

关于python - 将文本文件拆分为带有特殊分隔符行的部分 - python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25226871/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com