gpt4 book ai didi

biopython - 在没有输入文件的情况下在 Biopython 中创建比对

转载 作者:行者123 更新时间:2023-12-05 07:19:31 25 4
gpt4 key购买 nike

我有一个字典中的蛋白质序列比对(id_prot 作为键,比对序列作为值;可以是另一种格式),我想使用这个比对来用 Biopython 构建 NJ 树

但是,根据文档,加载用于系统发育分析的序列的唯一方法是从输入文件中加载。例如:

aln = AlignIO.read('Tests/TreeConstruction/msa.phy', 'phylip')

有谁知道如何在不读取输入文件的情况下加载 aln 变量中的序列?

最佳答案

从您的字典中以 Phylip 格式创建所需的输入并使用 StringIO 加载它。请注意,序列/蛋白质 ID 最长可达 10 个字符。 IDs less than 10 characters must have spaces appended to them to reach the 10 character fixed width.

from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor
from Bio import AlignIO
from io import StringIO

dct = {'Alpha': 'AACGTGGCCACAT',
'Beta': 'AAGGTCGCCACAC',
'Gamma': 'CAGTTCGCCACAA',
'Delta': 'GAGATTTCCGCCT',
'Epsilon': 'GAGATCTCCGCCC'}


count = len(dct)
length = max(map(len, dct.values()))

msa = f" {count} {length}\n"
msa += '\n'.join(f"{prot_id:<10} {sequence}" for prot_id, sequence in dct.items())
print(msa)
print()

aln = AlignIO.read(StringIO(msa), 'phylip')
print(aln)
print()

calculator = DistanceCalculator('identity')
dm = calculator.get_distance(aln)
print(dm)
print()

constructor = DistanceTreeConstructor(calculator, 'nj')
tree = constructor.build_tree(aln)
print(tree)

输出:

 5 13
Alpha AACGTGGCCACAT
Beta AAGGTCGCCACAC
Gamma CAGTTCGCCACAA
Delta GAGATTTCCGCCT
Epsilon GAGATCTCCGCCC

SingleLetterAlphabet() alignment with 5 rows and 13 columns
AACGTGGCCACAT Alpha
AAGGTCGCCACAC Beta
CAGTTCGCCACAA Gamma
GAGATTTCCGCCT Delta
GAGATCTCCGCCC Epsilon

Alpha 0
Beta 0.23076923076923073 0
Gamma 0.3846153846153846 0.23076923076923073 0
Delta 0.5384615384615384 0.5384615384615384 0.5384615384615384 0
Epsilon 0.6153846153846154 0.3846153846153846 0.46153846153846156 0.15384615384615385 0
Alpha Beta Gamma Delta Epsilon

Tree(rooted=False)
Clade(branch_length=0, name='Inner3')
Clade(branch_length=0.18269230769230765, name='Alpha')
Clade(branch_length=0.04807692307692307, name='Beta')
Clade(branch_length=0.04807692307692307, name='Inner2')
Clade(branch_length=0.27884615384615385, name='Inner1')
Clade(branch_length=0.051282051282051266, name='Epsilon')
Clade(branch_length=0.10256410256410259, name='Delta')
Clade(branch_length=0.14423076923076922, name='Gamma')

关于biopython - 在没有输入文件的情况下在 Biopython 中创建比对,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57736532/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com