python - 如何在不使用 Biopython 的情况下从 FASTA 文件获取此输出？-6ren

python - 如何在不使用 Biopython 的情况下从 FASTA 文件获取此输出？

转载作者：太空宇宙更新时间：2023-11-03 21:24:36

25

4

我需要从 FASTA 文件获取如下所示的输出，但不使用 BioPython。大家有什么想法吗？

这是使用 BioPython 的代码:

from Bio import SeqIO
records = SeqIO.parse("data/assembledSeqs.fa", "fasta")
for i, seq_record in enumerate(records):
    print("Sequence %d:" % i)
    print("Number of A's: %d" % seq_record.seq.count("A"))
    print("Number of C's: %d" % seq_record.seq.count("C"))
    print("Number of G's: %d" % seq_record.seq.count("G"))
    print("Number of T's: %d" % seq_record.seq.count("T"))
    print()

FASTA 文件如下所示:

>chr12_9180206_+:chr12_118582391_+:a1;2 total_counts: 115 Seed: 4 K:    20 length: 79
TTGGTTTCGTGGTTTTGCAAAGTATTGGCCTCCACCGCTATGTCTGGCTGGTTTACGAGC
AGGACAGGCCGCTAAAGTG
>chr12_9180206_+:chr12_118582391_+:a2;2 total_counts: 135 Seed: 4 K: 20 length: 80
CTAACCCCCTACTTCCCAGACAGCTGCTCGTACAGTTTGGGCACATAGTCATCCCACTCG
GCCTGGTAACACGTGCCAGC
>chr1_8969882_-:chr1_568670_-:a1;113 total_counts: 7600 Seed: 225 K: 20 length: 86
CACTCATGAGCTGTCCCCACATTAGGCTTAAAAACAGATGCAATTCCCGGACGTCTAAAC
CAAACCACTTTCACCGCCACACGACC
>chr1_8969882_-:chr1_568670_-:a2;69 total_counts: 6987 Seed: 197 K: 20   length: 120
TGAACCTACGACTACACCGACTACGGCGGACTAATCTTCAACTCCTACATACTTCCCCCA
TTATTCCTAGAACCAGGCGACCTGCGACTCCTTGACGTTGACAATCGAGTAGTACTCCCG

我需要获得以下输出:

Sequence 0:
Number of A's: 14
Number of C's: 17
Number of G's: 24
Number of T's: 24

Sequence 1:
Number of A's: 17
Number of C's: 30
Number of G's: 16
Number of T's: 17

Sequence 2:
Number of A's: 27
Number of C's: 31
Number of G's: 12
Number of T's: 16

Sequence 3:
Number of A's: 31
Number of C's: 41
Number of G's: 20
Number of T's: 28

我已经尝试过，但无法获得相同的输出。

def count_bases (fasta_file_name):
    with open(fasta_file_name) as file_content:
        for seqs in file_content:
            if seqs.startswith('>'):
                for i, seq in enumerate('>'):
                    print("Sequence %d:" % i)
            else:
                print("Number of A's: %d" % seqs.count("A"))
                print("Number of C's: %d" % seqs.count("C"))
                print("Number of G's: %d" % seqs.count("G"))
                print("Number of T's: %d" % seqs.count("T"))
                print()
    return bases

result = count_bases('data/assembledSeqs.fa')

最佳答案

这些代码可以工作:

def count_bases (fasta_file_name):
    sequece=''
    def count():
        if len(sequece):
            print("Number of A's: %d" % sequece.count("A"))
            print("Number of C's: %d" % sequece.count("C"))
            print("Number of G's: %d" % sequece.count("G"))
            print("Number of T's: %d" % sequece.count("T"))
            print()
    with open(fasta_file_name) as file_content:
        i=0
        for seqs in file_content:
            if seqs.startswith('>'):
                count()
                print("Sequence %d:" % i)
                i=i+1
                sequece=''
            else:
                sequece=sequece+seqs.strip()
        count()

result = count_bases('data/assembledSeqs.fa')

关于python - 如何在不使用 Biopython 的情况下从 FASTA 文件获取此输出？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53942135/

25

4

0

文章推荐： html - 使用显示表格单元格垂直对齐文本

文章推荐： python-2.7 - 使用 Kmeans 的 OpenCV 颜色分割

文章推荐： html - 在相对定位的 div css 上设置动画顶部属性

biopython - 在没有输入文件的情况下在 Biopython 中创建比对
我有一个字典中的蛋白质序列比对(id_prot 作为键，比对序列作为值；可以是另一种格式)，我想使用这个比对来用 Biopython 构建 NJ 树但是，根据文档，加载用于系统发育分析的序列的唯一方
biopython - 通过 biopython 连接到 Ensembl
我刚刚加入 python 和 biopython 工作，喜欢连接 Ensebml 并获取一些序列和其他数据，如 TSS、一些基因列表等。但我的问题是我似乎无法在 biopython 中找到任何方法或模
biopython - 如何使用biopython库将多个pdb写入单个pdb文件
我想知道如何使用 biopython 库将多个 pdb 写入单个 pdb 文件。对于读取NMR结构等多个pdb，documentation中有内容但对于写作，我没有找到。有人对此有想法吗？最佳答案
Biopython 从全基因组中检索特定 CDS
我是 Stackoverflow 的新手。我正在尝试使用 Biopython 自动化搜索过程。我有两个列表，一个是蛋白质 GI 编号，另一个是相应的核苷酸 GI 编号。例如: 蛋白质_GI=[588
python - 读取多个blast文件(biopython)
我正在尝试读取通过向 NCBIblast 网站提交多个序列而生成的 XML 文件列表。我想从每个文件中打印某些信息行。我想要读取的文件都带有后缀“_recombination.xml”。 for fi
python - BioPython 中系统发育树的子树
我有一个 newick 格式的系统发育树。我想根据终端节点的标签(因此基于物种列表)拉出一棵子树。我正在使用的树的副本可以在这里找到:http://hgdownload.soe.ucsc.edu/go
python - Biopython 从变量而不是文件解析
import gzip import io from Bio import SeqIO infile = "myinfile.fastq.gz" fileout = open("myoutfile.f
python - biopython 比对的无间隙索引
我第一次使用biopython。如果这是一个基本问题，请原谅我。我想输入序列，然后对齐它们，并能够引用原始序列(无间隙)和对齐序列(有间隙)的索引位置。我的现实世界示例是烯醇 enzyme (Un
python - BioPython 最好的云计算平台是什么？
我目前正在(作为一个高级项目)构建和实现一个生物信息学 Web 应用程序来操作大数据以及一些复杂的工作我正在使用biopython 哪种云计算平台最好，为什么？提前致谢最佳答案我一直在尝试使用
python - BioPython 字母汤
Biopython 菜鸟，我正在尝试创建一个程序，该程序使用 Biopython 包 Alphabet 和字母表模块 IUPAC 将列出的类的字母写出到名为 AlphabetSoupOuput.txt
python - Biopython 成对对齐在循环中运行时导致段错误
我正在尝试在 biopython 中循环运行大约 10000 对字符串的成对全局对齐方法。每个字符串平均长度为 20 个字符。为一对序列运行该方法效果很好。但是在一个循环中运行它，低至 4 对，会导致
python - 拉普拉斯平滑到 Biopython
我正在尝试为 Biopython 的朴素贝叶斯代码添加拉普拉斯平滑支持 1对于我的生物信息学项目。我已经阅读了很多关于朴素贝叶斯算法和拉普拉斯平滑的文档，我想我已经了解了基本的想法，但我无法将其与该
linux - biopython 肌肉命令行
我正在使用 BioPython MuscleCommanLine 来比对子进程中的序列。肌肉的输入和输出是标准输入和标准输出。这行得通，但是一旦 popen 调用 muscle，我就会在屏幕上从 mu
python - BioPython:如何将氨基酸字母表转换为
在讨论如何使用 Bio.SeqIO.parse() 导入序列数据时，BioPython 说明书指出: There is an optional argument alphabet to specify
python - biopython 有没有办法从发表的文章中获取完整的摘要？
我目前有以下查询 pubmed 的代码: from Bio import Entrez Entrez.email = "kuharrw@hiram.edu" # Always tell NCB
Numpy 和 Biopython 必须集成吗？
例如...我有两个脚本用于查看(多序列比对)MSA 是否具有超过 50 列且间隙少于 50%。第一次使用 BioPython 需要 4.2 秒，在 609 列的 16281 个序列的 MSA 中(f
list - 尝试从 Biopython 获取分类信息
我正在尝试更改以前的脚本，该脚本利用 biopython 获取有关物种门的信息。编写此脚本是为了一次检索一个物种的信息。我想修改脚本，以便我可以一次对 100 个生物执行此操作。这是初始代码 imp
python - Biopython 脚本不起作用，它发送错误类型生成器
在尝试使用biopython解析xml文件时，我遇到了一些我不明白的错误，有人可以帮助我理解这个错误吗？ TypeError: object of type 'generator' has no le
python - 使用 Biopython 的搜索词返回登录号
我正在尝试将 Biopython (Entrez) 与搜索词一起使用，该搜索词将返回登录号(而不是 GI*)。这是我的代码的一小段摘录: from Bio import Entrez Entrez.
python - 如何使用 Biopython 查找蛋白质的核苷酸序列？
我有一些蛋白质，我想找到它们相应的核苷酸序列。我还有发现该蛋白质的基因组。在基因组中，我找到了该蛋白质对应的基因ID。但是，我无法通过基因 ID 获取核苷酸序列。我尝试过使用 Entrez Efetc

首页

博学

6Ren·AI

商城

python - 如何在不使用 Biopython 的情况下从 FASTA 文件获取此输出？