gpt4 book ai didi

python - 从 csv 文件中删除某些列与特定正则表达式匹配的行

转载 作者:太空宇宙 更新时间:2023-11-03 17:55:45 27 4
gpt4 key购买 nike

我有以下 csv 文件:

ID,PDBID,FirstResidue,FirstChain,SecondResidue,SecondChain,ThirdResidue,ThirdChain,FourthResidue,FourthChain,Pattern
RZ_AUTO_505,1hmh,A22L,C,A22L,A,G21L,A,A23L,A,AA/GA Naked ribose
RZ_AUTO_506,1hmh,A22L,C,A22L,A,G114,A,A23L,A,AA/GA Naked ribose
RZ_AUTO_507,1hmh,A130,E,A90,A,G80,A,A130,A,AA/GA Naked ribose
RZ_AUTO_508,1hmh,A140,E,A90,E,G120,A,A90,A,AA/GA Naked ribose
RZ_AUTO_509,1hmh,G102,A,C103,A,G102,E,A90,E,GC/GA Single ribose
RZ_AUTO_510,1hmh,G102,A,C103,A,G120,E,A90,E,GC/GA Single ribose
RZ_AUTO_511,1hmh,G113,C,C112,C,G21L,A,A23L,A,GC/GA Single ribose
RZ_AUTO_512,1hmh,G113,C,C112,C,G114,A,A23L,A,GC/GA Single ribose
RZ_AUTO_513,1hnw,C1496,A,G1497,A,A1518,A,A1519,A,CG/AA Canonical ribose
RZ_AUTO_514,1hnw,C1496,A,G1497,A,A1519,A,A1518,A,CG/AA Canonical ribose
RZ_AUTO_515,1hnw,C221,A,U222,A,A195,A,A196,A,CU/AA Canonical ribose
RZ_AUTO_516,1hnw,C221,A,U222,A,A196,A,A195,A,CU/AA Canonical ribose

如果 FirstResidue 或 SecondResidue 或 ThirdResidue 或 FourthResidue 的值与正则表达式“[A-Za-z]$”匹配,我需要删除 csv 行。输出应如下所示。

RZ_AUTO_507,1hmh,A130,E,A90,A,G80,A,A130,A,AA/GA Naked ribose
RZ_AUTO_508,1hmh,A140,E,A90,E,G120,A,A90,A,AA/GA Naked ribose
RZ_AUTO_509,1hmh,G102,A,C103,A,G102,E,A90,E,GC/GA Single ribose
RZ_AUTO_510,1hmh,G102,A,C103,A,G120,E,A90,E,GC/GA Single ribose
RZ_AUTO_513,1hnw,C1496,A,G1497,A,A1518,A,A1519,A,CG/AA Canonical ribose
RZ_AUTO_514,1hnw,C1496,A,G1497,A,A1519,A,A1518,A,CG/AA Canonical ribose
RZ_AUTO_515,1hnw,C221,A,U222,A,A195,A,A196,A,CU/AA Canonical ribose
RZ_AUTO_516,1hnw,C221,A,U222,A,A196,A,A195,A,CU/AA Canonical ribose

到目前为止,我已将每一列保存为列表,但我不确定下一步如何继续。这是我的代码:

import csv
import re

rzid = []
pdbid = []
first_residue = []
first_chain = []
second_residue = []
second_chain = []
third_residue = []
third_chain = []
fourth_residue = []
fourth_chain = []
rz_pattern = []

#open csv file rz45.csv
f = open( 'rz45.csv', 'rU' ) #open the file in read universal mode
for line in f:
cells = line.split( "," )
rzid.append( (cells[0]) )
pdbid.append( (cells[1]) )
first_residue.append( (cells[2]) )
first_chain.append( (cells[3]) )
second_residue.append( (cells[4]) )
second_chain.append( (cells[5]) )
third_residue.append( (cells[6]) )
third_chain.append( (cells[7]) )
fourth_residue.append( (cells[8]) )
fourth_chain.append( (cells[9]) )
rz_pattern.append( (cells[10]) )

f.close()

有人可以帮忙吗?谢谢

更新1

import re
import csv

output = []
regex = '[AUGC]\d{1,4}'

#open csv file test_regex.csv
f = open( 'test_regex.csv', 'rU' ) #open the file in read universal mode
for line in f:
cells = line.split( "," )
output.append( [ cells[ 2 ], cells[ 4 ], cells[ 6 ], cells[ 8 ] ] )
match = re.search(regex, str(output))
if match:
print line
f.close()

我对代码进行了一些更改,但我仍然不确定如何检查单元格 [2,4,6,8] 中的所有值是否满足给定的正则表达式。有人可以建议下一步如何进行吗?

最佳答案

类似的东西有效(至少在你的例子中):

import csv
import re

tgt=['FirstResidue', 'SecondResidue', 'ThirdResidue']

with open(file) as f:
reader=csv.reader(f)
header=next(reader)
for row in reader:
di={k:v for k, v in zip(header, row)}
if any(re.search(r'[A-Za-z]$', s) for s in [di[x] for x in tgt]):
continue
print row

打印:

['RZ_AUTO_507', '1hmh', 'A130', 'E', 'A90', 'A', 'G80', 'A', 'A130', 'A', 'AA/GA Naked ribose']
['RZ_AUTO_508', '1hmh', 'A140', 'E', 'A90', 'E', 'G120', 'A', 'A90', 'A', 'AA/GA Naked ribose']
['RZ_AUTO_509', '1hmh', 'G102', 'A', 'C103', 'A', 'G102', 'E', 'A90', 'E', 'GC/GA Single ribose']
['RZ_AUTO_510', '1hmh', 'G102', 'A', 'C103', 'A', 'G120', 'E', 'A90', 'E', 'GC/GA Single ribose']
['RZ_AUTO_512', '1hmh', 'G113', 'C', 'C112', 'C', 'G114', 'A', 'A23L', 'A', 'GC/GA Single ribose']
['RZ_AUTO_513', '1hnw', 'C1496', 'A', 'G1497', 'A', 'A1518', 'A', 'A1519', 'A', 'CG/AA Canonical ribose']
['RZ_AUTO_514', '1hnw', 'C1496', 'A', 'G1497', 'A', 'A1519', 'A', 'A1518', 'A', 'CG/AA Canonical ribose']
['RZ_AUTO_515', '1hnw', 'C221', 'A', 'U222', 'A', 'A195', 'A', 'A196', 'A', 'CU/AA Canonical ribose']
['RZ_AUTO_516', '1hnw', 'C221', 'A', 'U222', 'A', 'A196', 'A', 'A195', 'A', 'CU/AA Canonical ribose']

根据正则表达式过滤数据后,您就得到了您想要的。将其写入新的 csv 或任何您想要的内容。

关于python - 从 csv 文件中删除某些列与特定正则表达式匹配的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28471390/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com