gpt4 book ai didi

python - 将一个文件的部分写入另外两个文件

转载 作者:太空宇宙 更新时间:2023-11-03 14:26:52 24 4
gpt4 key购买 nike

所以我有一个文件,其中包含我正在编写的程序的组合实例(数字列表)。然后,我继续将所有带有“@”的行放入训练和测试文件中。现在我想将 28,709 个实例放入我的训练文件中,然后将文件的其余实例放入测试文件中。

当我这样做时,使用以下代码:

import itertools

# Splits the training and testing instances
# with the newly reduced attributes

training = open('training.txt', 'w')
testing = open('testing.txt', 'w')

linecount = 0

with open('combined.txt', 'r') as f:
for l in f:
if not l.startswith('@'):
break
else:
training.write(l)
testing.write(l)
linecount += 1

with open('combined.txt', 'r') as f:
newcount = 0
for l in f:
while(newcount < linecount):
f.next()
newcount += 1

if linecount > (linecount + 28709):
testing.write(l)
else:
training.write(l)
linecount += 1
'''# Write 28,709 instances to training set
for l in itertools.islice(f, linecount, linecount + 28709):
training.write(l)
# Write rest of instances to testing set
for i in xrange(linecount + 28710):
f.next()
for l in f:
testing.write(l)'''

..它不会对训练集执行所有实例,也不输出任何测试集。可以在此处找到原始组合文件(太大而无法粘贴到此处):https://gist.githubusercontent.com/ryankshah/618fde939a54c5eb8642135ab1f4514c/raw/a5a11c0fc301a6724b9af4c413d76b96ffa9859c/combined.txt

编辑:所有@符号行都应该在两者中。那么最后一个“@”之后的前 28709 行应该在训练文件中,其余的在测试文件中

谢谢!

最佳答案

这应该可以满足您的需要。我在代码中添加了注释来解释我所做的更改。

# Splits the training and testing instances
# with the newly reduced attributes

training = open('training.txt', 'w')
testing = open('testing.txt', 'w')

linecount = 0

with open('combined.txt', 'r') as f:
for l in f:
if not l.startswith('@'):
break
else:
training.write(l)
testing.write(l)
# increment every time to get position of last '@' symbol
# can't skip lines in between '@'' symbols
linecount += 1

val = 28709

with open('combined.txt', 'r') as f:
# skip first n lines up to last '@' symbol
for _ in range(linecount):
f.next()

# write first 28709 lines after last '@' symbol to training file
new_linecount = 0
for l in f:
if new_linecount >= val:
testing.write(l)
else:
training.write(l)
new_linecount += 1
'''# Write 28,709 instances to training set
for l in itertools.islice(f, linecount, linecount + 28709):
training.write(l)
# Write rest of instances to testing set
for i in xrange(linecount + 28710):
f.next()
for l in f:
testing.write(l)'''

关于python - 将一个文件的部分写入另外两个文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47574672/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com