gpt4 book ai didi

python - 如何将 python 中 for 循环的输出写入 csv 格式的文件?

转载 作者:行者123 更新时间:2023-11-28 16:31:19 25 4
gpt4 key购买 nike

以下是 python 脚本,用于识别在不同文件列表中是否找到某些单词。

experiment=open('potentiation.txt')
lines=experiment.read().splitlines()
receptors=['crystal_1.txt', 'modeller_1.txt', 'moe_1.txt',
'nci5_modeller0000_1.txt', 'nci5_modeller0001_1.txt',
'nci5_modeller0002_1.txt', 'nci5_modeller0003_1.txt',
'nci5_modeller0004_1.txt', 'nci5_modeller0005_1.txt',
'nci5_modeller0006_1.txt', 'nci5_modeller0007_1.txt',
'nci5_modeller0008_1.txt', 'nci5_modeller0009_1.txt',
'nci5_modeller0010_1.txt', 'nci5_modeller0011_1.txt',
'nci5_moe0000_1.txt', 'nci5_moe0001_1.txt', 'nci5_moe0002_1.txt',
'nci5_moe0003_1.txt', 'nci5_moe0004_1.txt', 'nci5_moe0005_1.txt',
'nci5_moe0006_1.txt', 'nci5_moe0007_1.txt', 'nci5_moe0008_1.txt',
'nci5_moe0009_1.txt', 'nci5_moe0010_1.txt', 'nci5_moe0011_1.txt',
'nci5_moe0012_1.txt', 'nci5_moe0013_1.txt', 'nci5_moe0014_1.txt']

for ligand in lines:
for protein in receptors:
file1=open(protein,"r")
read1=file1.read()
find_hit=read1.find(ligand)
if find_hit == -1:
print ligand,protein,"Not Found"
else:
print ligand,protein, "Found"

此代码的输出示例如下:

345647 nci5_moe0012_1.txt Not Found
345647 nci5_moe0013_1.txt Not Found
345647 nci5_moe0014_1.txt Found

我的问题是如何获取输出并将其格式化为如下例所示的 csv 文件?

Ligand  nci5_moe0012_1. nci5_moe_0013_1   nci5_moe_0014
345647 Not Found Not Found Found

最佳答案

我认为像这样的事情就可以了(假设你的输出文件是制表符分隔的):

import csv
import os

receptors = ['crystal_1', 'modeller_1', 'moe_1',
'nci5_modeller0000_1', 'nci5_modeller0001_1',
'nci5_modeller0002_1', 'nci5_modeller0003_1',
'nci5_modeller0004_1', 'nci5_modeller0005_1',
'nci5_modeller0006_1', 'nci5_modeller0007_1',
'nci5_modeller0008_1', 'nci5_modeller0009_1',
'nci5_modeller0010_1', 'nci5_modeller0011_1',
'nci5_moe0000_1', 'nci5_moe0001_1', 'nci5_moe0002_1',
'nci5_moe0003_1', 'nci5_moe0004_1', 'nci5_moe0005_1',
'nci5_moe0006_1', 'nci5_moe0007_1', 'nci5_moe0008_1',
'nci5_moe0009_1', 'nci5_moe0010_1', 'nci5_moe0011_1',
'nci5_moe0012_1', 'nci5_moe0013_1', 'nci5_moe0014_1']

with open('potentiation.txt', 'rt') as experiment, \
open('output.csv', 'wb') as outfile:
csv_writer = csv.writer(outfile, delimiter='\t')
csv_writer.writerow(['Ligand'] + receptors) # header row
for ligand in (line.rstrip() for line in experiment):
row = [ligand]
for protein in receptors:
with open(protein+'.txt', "rt") as file1:
found = ['Found', 'Not Found'][file1.read().find(ligand) == -1]
row.append(found)
csv_writer.writerow(row)

print('output.csv file written')

更新

正如我在评论中所说,只读取一次蛋白质文件可以更快地完成此操作。为了能够做到这一点并按照你想要的方式格式化输出,检查每个文件中每个配体的结果需要存储在一个数据结构中,随着每个文件被读取然后多次检查而逐渐建立,在所有工作都完成之后,只能一次全部写出来。一个简单的列表列表就足以满足此目的,并已在下面的实现中使用。

权衡是使用更多内存与一遍又一遍地读取和重新读取蛋白质文件。由于磁盘 IO 通常是计算机上最慢的事情之一,因此仅略微增加代码复杂性就可能获得巨大的性能提升可能是值得的。

这是显示此替代版本的代码:

import csv
import os

receptors = ['crystal_1', 'modeller_1', 'moe_1',
'nci5_modeller0000_1', 'nci5_modeller0001_1',
'nci5_modeller0002_1', 'nci5_modeller0003_1',
'nci5_modeller0004_1', 'nci5_modeller0005_1',
'nci5_modeller0006_1', 'nci5_modeller0007_1',
'nci5_modeller0008_1', 'nci5_modeller0009_1',
'nci5_modeller0010_1', 'nci5_modeller0011_1',
'nci5_moe0000_1', 'nci5_moe0001_1', 'nci5_moe0002_1',
'nci5_moe0003_1', 'nci5_moe0004_1', 'nci5_moe0005_1',
'nci5_moe0006_1', 'nci5_moe0007_1', 'nci5_moe0008_1',
'nci5_moe0009_1', 'nci5_moe0010_1', 'nci5_moe0011_1',
'nci5_moe0012_1', 'nci5_moe0013_1', 'nci5_moe0014_1']

# initialize list of lists holding each ligand and its presence in each receptor
with open('potentiation.txt') as experiment:
ligands = [[ligand] for ligand in (line.rstrip() for line in experiment)]

for protein in receptors:
with open(protein + '.txt') as protein_file:
protein_file_data = protein_file.read()
for row in ligands:
# determine if this ligand (row[0]) appears in protein data
row.append('Found' if row[0] in protein_file_data else 'Not Found')

with open('output.csv', 'wb') as outfile:
csv_writer = csv.writer(outfile, delimiter='\t')
csv_writer.writerow(['Ligand'] + receptors) # header row
csv_writer.writerows(ligands)

print('output.csv file written')

关于python - 如何将 python 中 for 循环的输出写入 csv 格式的文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31629988/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com