gpt4 book ai didi

python - 性能 - python文本文件编辑2GB文件

转载 作者:太空宇宙 更新时间:2023-11-03 18:46:11 25 4
gpt4 key购买 nike

我正在 python 3 中运行以下代码以获取 .txt 文件,每隔一行进行编辑,并存储编辑后的 ​​.txt 文件。它非常适合小文件,但我的文件大约有 2GB,而且需要很长时间。

有人对如何更改代码以提高效率和速度有任何建议吗?

newData = ""
i=0
run=0
j=0
k=1
seqFile = open('temp100.txt', 'r')
seqData = seqFile.readlines()
while i < 14371315:
sLine = seqData[j]
editLine = seqData[k]
tempLine = editLine[0:20]
newLine = editLine.replace(editLine, tempLine)
newData = newData + sLine + newLine
if len(seqData[k]) > 20:
newData += '\n'
i=i+1
j=j+2
k=k+2
run=run+1
print(run)

seqFile.close()

new = open("new_temp100.txt", "w")
sys.stdout = new
print(newData)

最佳答案

我建议这样:

# if python 2.x
#from itertools import tee, izip
# if python 3
from itertols import tee
# http://docs.python.org/2/library/itertools.html#recipes
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
# if python 2.x
#return izip(a, b)
return zip(a, b)

new_data = []
with open('temp100.txt', 'r') as sqFile:
for sLine, edit_line in pairwise(seqFile):
# I think this is just new_line = tempLine
#tempLine = edit_line[:20]
#new_line = editLine.replace(editLine, tempLine)
new_data.append(sLine + editLine[:20])
if len(sLine) > 20:
new_data.append('\n')



with open("new_temp100.txt", "w") as new:
new.write(''.join(new_data))

如果直接流式传输到磁盘,您可能会做得更好

# if python 2.x
#from itertools import tee, izip
# if python 3
from itertols import tee
# http://docs.python.org/2/library/itertools.html#recipes
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
# if python 2.x
#return izip(a, b)
return zip(a, b)

new_data = []
with open('temp100.txt', 'r') as sqFile:
with open("new_temp100.txt", "w") as new:
for sLine, edit_line in pairwise(seqFile):
tmp_str = sLine + editLine[:20]
if len(sLine) > 20:
tmp_str = tmp_str + '/n'
new.write(tmp_str)

因此您不必将文件的全部内容保存到内存中

关于python - 性能 - python文本文件编辑2GB文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19470733/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com