gpt4 book ai didi

python迭代只选择包含特定字符的字符串

转载 作者:行者123 更新时间:2023-12-04 08:35:12 24 4
gpt4 key购买 nike

我想遍历 kmers 列表并选择仅包含字符 A 、 T 、 G 和 C 的项目

kmers=["AL","AT","GC","AA","AP"]

for kmer in kmers:
for letter in kmer:
if letter not in ["A","T","G","C"]:
pass
else:
DNA_kmers.append(kmer)
print("DNA_kmers",DNA_kmers)
输出:
DNA_kmers ['AL', 'AT', 'AT', 'GC', 'GC', 'AA', 'AA', 'AP']
所需的输出:
DNA_kmers=["AT","GC","AA"]
我知道的唯一方法是
if "B" in kmer or "D" in kmer or "E" in kmer or "F" in kmer or "H" in kmer or "I" in kmer or "J" in kmer or "K" in kmer or "L" in kmer or "M" in kmer or "N" in kmer or "O" in kmer or "P" in kmer or "Q" in kmer or "R" in kmer or "S" in kmer or "U" in kmer or "V" in kmer or "W" in kmer or "X" in kmer or "Y" in kmer or "Z" in kmer:
pass

最佳答案

您的代码当前将添加其中任一字符匹配的任何项目。我们可以调整它以仅添加两个字符匹配的项目:

kmers=["AL","AT","GC","AA","AP"]
DNA_kmers =[]

for kmer in kmers:
for letter in kmer:
if letter not in ["A","T","G","C"]:
break
else:
DNA_kmers.append(kmer)

print("DNA_kmers",DNA_kmers)
如果您不熟悉 Python,我已经使用了 else关于 for 的条款环形。这并不适用于所有语言。 else当且仅当循环完成所有迭代时才会运行块。
有明显更简单的方法来做你想做的事情。例如,以下将使用嵌套列表推导完成工作:
kmers=["AL","AT","GC","AA","AP"]

allowed = set("AGCT")
print([k for k in kmers if all([c in allowed for c in k])])
一个更高效的通用解决方案是使用正则表达式:
import re

kmers=["AL","AT","GC","AA","AP"]
r = re.compile("^[ATGC]*$")
print([k for k in kmers if r.match(k)])
如果我们将问题限制在 k=2 的 k-mers,我们可以进一步优化性能。如果匹配固定长度的字符串,正则表达式的性能应该会略有提高,例如使用 [AGCT]{2} .我们也可以使用 product创建一个用于常量时间查找的集合:
import itertools

kmers=["AL","AT","GC","AA","AP"]

allowed = {a+b for a,b in itertools.product("AGCT", repeat=2)}
print([k for k in kmers if k in allowed])

关于python迭代只选择包含特定字符的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64835158/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com