gpt4 book ai didi

python - 基于匹配字符的文件分离

转载 作者:太空宇宙 更新时间:2023-11-04 10:22:32 25 4
gpt4 key购买 nike

  ATOM    856  CE ALYS A 104       0.809   0.146  26.161  0.54 29.14           C
ATOM 857 CE BLYS A 104 0.984 -0.018 26.394 0.46 31.19 C
ATOM 858 NZ ALYS A 104 1.988 0.923 26.662 0.54 33.17 N
ATOM 859 NZ BLYS A 104 1.708 0.302 27.659 0.46 37.61 N
ATOM 860 OXT LYS A 104 -0.726 -6.025 27.180 1.00 26.53 O
ATOM 862 N LYS B 276 17.010 -16.138 9.618 1.00 41.00 N
ATOM 863 CA LYS B 276 16.764 -16.524 11.005 1.00 31.05 C
ATOM 864 C LYS B 276 16.428 -15.306 11.884 1.00 26.93 C
ATOM 865 O LYS B 276 16.258 -15.447 13.090 1.00 29.67 O
ATOM 866 CB LYS B 276 17.863 -17.347 11.617 1.00 33.62 C

我有上面的文本文件,需要根据行中第21位的差异制作两个文本文件。我写了一个可以打印所需结果的脚本。但是如果我不知道第 21 列的字符是什么,我怎么能做这个工作。以下是我试过的脚本。假设我不知道第 21 行是“A”和“B”还是“B”和“G”或任何其他组合,需要在第 21 行的基础上进行分隔。我该怎么做?

  import sys

for fn in sys.argv[1:]:
f=open(fn,'r')

while 1:
line=f.readline()
if not line: break
if line[21:22] == 'B':
chns = line[0:80]
print chns

最佳答案

  • 存储上一行第 21 个字符的先前值,然后为每个不匹配的地方添加一个换行符 (这意味着另一组相同字母) 打印基于其第 21 个字符的分组行。

  • 请注意,它只会根据文件中的行顺序对匹配第 21 个字符的行进行分组,这意味着未排序的行将有多个单独的组相同的第 21 个字符

    修改文件以显示此案例:

    ATOM    856  CE ALYS A 104       0.809   0.146  26.161  0.54 29.14           C
    ATOM 857 CE BLYS A 104 0.984 -0.018 26.394 0.46 31.19 C
    ATOM 862 N LYS B 276 17.010 -16.138 9.618 1.00 41.00 N
    ATOM 863 CA LYS B 276 16.764 -16.524 11.005 1.00 31.05 C
    ATOM 864 C LYS B 276 16.428 -15.306 11.884 1.00 26.93 C
    ATOM 865 O LYS B 276 16.258 -15.447 13.090 1.00 29.67 O
    ATOM 866 CB LYS B 276 17.863 -17.347 11.617 1.00 33.62 C
    ATOM 858 NZ ALYS A 104 1.988 0.923 26.662 0.54 33.17 N
    ATOM 859 NZ BLYS A 104 1.708 0.302 27.659 0.46 37.61 N
    ATOM 860 OXT LYS A 104 -0.726 -6.025 27.180 1.00 26.53 O

    产生这种情况的代码(不对行进行排序):

    import sys

    for fn in sys.argv[1:]:

    with open(fn,'r') as file:
    prev = 0
    for line in file:
    line = line.strip()
    if line[21:22] != prev:
    # new line separator for each group
    print ''
    print line
    prev = line[21:22]

    显示这种情况的示例输出:

    ATOM    856  CE ALYS A 104       0.809   0.146  26.161  0.54 29.14           C
    ATOM 857 CE BLYS A 104 0.984 -0.018 26.394 0.46 31.19 C

    ATOM 862 N LYS B 276 17.010 -16.138 9.618 1.00 41.00 N
    ATOM 863 CA LYS B 276 16.764 -16.524 11.005 1.00 31.05 C
    ATOM 864 C LYS B 276 16.428 -15.306 11.884 1.00 26.93 C
    ATOM 865 O LYS B 276 16.258 -15.447 13.090 1.00 29.67 O
    ATOM 866 CB LYS B 276 17.863 -17.347 11.617 1.00 33.62 C

    ATOM 858 NZ ALYS A 104 1.988 0.923 26.662 0.54 33.17 N
    ATOM 859 NZ BLYS A 104 1.708 0.302 27.659 0.46 37.61 N
    ATOM 860 OXT LYS A 104 -0.726 -6.025 27.180 1.00 26.53 O
  • 因此,如果您希望每个相同的第 21 个字符仅一组,请将所有行放在一个列表中并排序它使用 list.sort() 就可以了。

    代码(分组前先对行进行排序):

    import sys

    for fn in sys.argv[1:]:

    with open(fn,'r') as file:

    lines = file.readlines()

    # creates a list or pairs (21st char, line) within a list
    lines = [ [line[21:22], line.strip() ] for line in lines ]

    # sorts lines based on key (21st char)
    lines.sort()

    # brings back list of lines to its original state,
    # but the order is not reverted since it is already sorted
    lines = [ line[1] for line in lines ]

    prev = 0
    for line in lines:
    if line[21:22] != prev:
    # new line separator for each group
    print ''
    print line
    prev = line[21:22]

    输出到:

    ATOM    856  CE ALYS A 104       0.809   0.146  26.161  0.54 29.14           C
    ATOM 857 CE BLYS A 104 0.984 -0.018 26.394 0.46 31.19 C
    ATOM 858 NZ ALYS A 104 1.988 0.923 26.662 0.54 33.17 N
    ATOM 859 NZ BLYS A 104 1.708 0.302 27.659 0.46 37.61 N
    ATOM 860 OXT LYS A 104 -0.726 -6.025 27.180 1.00 26.53 O

    ATOM 862 N LYS B 276 17.010 -16.138 9.618 1.00 41.00 N
    ATOM 863 CA LYS B 276 16.764 -16.524 11.005 1.00 31.05 C
    ATOM 864 C LYS B 276 16.428 -15.306 11.884 1.00 26.93 C
    ATOM 865 O LYS B 276 16.258 -15.447 13.090 1.00 29.67 O
    ATOM 866 CB LYS B 276 17.863 -17.347 11.617 1.00 33.62 C

编辑:

在不同文件中写入分组行实际上不需要检查前一行的值,因为根据第 21 个字符更改文件名会打开一个新文件,从而分隔行。但在这里,我使用了 prev,这样任何以前创建的具有相同文件名的文件都不会被附加,这可能会导致文件内容困惑或不一致。

import sys

for fn in sys.argv[1:]:
with open(fn,'r') as file:

lines = file.readlines()

# creates a list or pairs (21st char, line) within a list
lines = [ [line[21:22], line ] for line in lines ]

# sorts lines based on key (21st char)
lines.sort()

# brings back list of lines to its original state,
# but the order is not reverted since it is already sorted
lines = [ line[1] for line in lines ]

filename = 'file'
prev = 0
for line in lines:
if line[21:22] != prev:
# creates a new file
file = open(filename + line[21:22] + '.txt', 'w')
else:
# appends to the file
file = open(filename + line[21:22] + '.txt', 'a')

file.write(line)
prev = line[21:22]

如果附加以前创建的文件不是问题,则可以简化文件写入部分。但是,它存在写入具有相同文件名的文件的风险,该文件不是由脚本创建的,也不是由脚本在早期执行/ session 期间创建的。

filename = 'file'
for line in lines:
file = open(filename + line[21:22] + '.txt', 'a')
file.write(line)

关于python - 基于匹配字符的文件分离,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31454438/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com