gpt4 book ai didi

Python将文本分割成x个字符的 block

转载 作者:行者123 更新时间:2023-11-30 23:24:59 24 4
gpt4 key购买 nike

我使用此代码来解析文本文件并以将每个句子放在新行中的方式对其进行格式化:

import re

# open the file to be formatted
filename=open('inputfile.txt','r')
f=filename.read()
filename.close()

# put every sentence in a new line
pat = ('(?<!Dr)(?<!Esq)\. +(?=[A-Z])')
lines = re.sub(pat,'.\n',f)
print lines

# write the formatted text
# into a new txt file
filename = open("outputfile.txt", "w")
filename.write(lines)
filename.close()

但本质上我需要在 110 个字符之后分割句子。因此,如果一行中的句子超过 110,它会将其拆分并在末尾添加 '...',然后用 '...' 开始一个新行,并跟随拆分句子的其他部分,等等。

有什么建议吗?我不知何故迷路了。

最佳答案

# open inputfile/read/auto-close 
with open('inputfile.txt') as f:
lines = f.readlines() # with block auto closes file after block is executed

output = []

for line in lines:
if len(line) > 110:
while True: # until break
output.append(line[:107] + '...')
if len(line[107:]) < 111: # if remainder of line is under 110 chars
output.append('...' + line[107:])
break
line = line[107:] # otherwise loop continues with new line definition
else:
output.append(line)

# open outputfile/write/auto-closed
with open('outputfile.txt', 'w') as f:
for line in output:
f.write(line)

关于Python将文本分割成x个字符的 block ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23165111/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com