python - 将一个文件的部分写入另外两个文件-6ren

python - 将一个文件的部分写入另外两个文件

转载作者：太空宇宙更新时间：2023-11-03 14:26:52

24

4

所以我有一个文件，其中包含我正在编写的程序的组合实例(数字列表)。然后，我继续将所有带有“@”的行放入训练和测试文件中。现在我想将 28,709 个实例放入我的训练文件中，然后将文件的其余实例放入测试文件中。

当我这样做时，使用以下代码:

import itertools

# Splits the training and testing instances
# with the newly reduced attributes

training = open('training.txt', 'w')
testing = open('testing.txt', 'w')

linecount = 0

with open('combined.txt', 'r') as f:
    for l in f:
        if not l.startswith('@'):
            break
        else:
            training.write(l)
            testing.write(l)
            linecount += 1

with open('combined.txt', 'r') as f:
    newcount = 0
    for l in f:
        while(newcount < linecount):
            f.next()
            newcount += 1

        if linecount > (linecount + 28709):
            testing.write(l)
        else:
            training.write(l)
        linecount += 1
    '''# Write 28,709 instances to training set
    for l in itertools.islice(f, linecount, linecount + 28709):
        training.write(l)
    # Write rest of instances to testing set
    for i in xrange(linecount + 28710):
        f.next()
    for l in f:
        testing.write(l)'''

..它不会对训练集执行所有实例，也不输出任何测试集。可以在此处找到原始组合文件(太大而无法粘贴到此处):https://gist.githubusercontent.com/ryankshah/618fde939a54c5eb8642135ab1f4514c/raw/a5a11c0fc301a6724b9af4c413d76b96ffa9859c/combined.txt

编辑:所有@符号行都应该在两者中。那么最后一个“@”之后的前 28709 行应该在训练文件中，其余的在测试文件中

谢谢!

最佳答案

这应该可以满足您的需要。我在代码中添加了注释来解释我所做的更改。

# Splits the training and testing instances
# with the newly reduced attributes

training = open('training.txt', 'w')
testing = open('testing.txt', 'w')

linecount = 0

with open('combined.txt', 'r') as f:
    for l in f:
        if not l.startswith('@'):
            break
        else:
            training.write(l)
            testing.write(l)
        # increment every time to get position of last '@' symbol
        # can't skip lines in between '@'' symbols
        linecount += 1

val = 28709

with open('combined.txt', 'r') as f:
    # skip first n lines up to last '@' symbol
    for _ in range(linecount):
        f.next()

    # write first 28709 lines after last '@' symbol to training file
    new_linecount = 0
    for l in f:
        if new_linecount >= val:
            testing.write(l)
        else:
            training.write(l)
        new_linecount += 1
    '''# Write 28,709 instances to training set
    for l in itertools.islice(f, linecount, linecount + 28709):
        training.write(l)
    # Write rest of instances to testing set
    for i in xrange(linecount + 28710):
        f.next()
    for l in f:
        testing.write(l)'''

关于python - 将一个文件的部分写入另外两个文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47574672/

24

4

0

文章推荐： python - 生成器和 yield 语句

文章推荐： python - 如何使用 dateutil 解析 0 小时

文章推荐： python - matplotlib:如何在给定的一组半径处绘制同心圆

文章推荐： python - 是否有用于分箱数据的 sci.stats.moment 函数？

typescript - A 部分部分 io-ts
我在使用 io-ts 时遇到一些问题。我发现它确实缺乏文档，我取得的大部分进展都是通过 GitHub issues 取得的。不，我不明白 HKT，所以没有帮助。基本上，我在其他地方创建一个类型，ty
java - 匹配完整文件正则表达式中的 A 部分，但不匹配 B 部分
我必须创建一个正则表达式来搜索整个文件，以找到与 Java XML 解析器的第一部分(但不是第二部分)的匹配项。这将用于防止某些 XXE 攻击。不幸的是，它确实必须是单个正则表达式，并且它确实需要搜索
c# - 部分/部分中的 asp.net mvs 部分？
我有一些简单的 Shared/_Header.cshtml 文件中的内容。 My Shared/_Layout.cshtml 通过调用插入该代码 @Html.Partial("_Header") 目前
java - Selenium 只执行循环的 if != null 部分，不运行循环的 "else if null "部分
我有一个 if-else 语句，其中: 条件 1:ID 匹配并且自动填充某些字段。然后 if 语句只填充其余字段条件 2:ID 不匹配，所有字段均为空白。 ELSE 语句将它们全部填充当我使条件
javascript - 无法在 JSFIDDLE 中使用滚动魔法(第 1 部分，共 2 部分)
我正在开发一个单页滚动网站。我正在尝试实现 ScrollMagic 并固定第一部分，以便网站的其余部分滚动到固定部分的顶部。我尝试创建一个 jsfiddle 来显示问题，但我似乎无法让 jsfiddl
javascript - 既然有

首页

博学

6Ren·AI

商城

python - 将一个文件的部分写入另外两个文件