gpt4 book ai didi

bioinformatics - 执行检查点中间命令

转载 作者:行者123 更新时间:2023-12-05 06:26:20 25 4
gpt4 key购买 nike

我目前遇到了一些关于 snakemake 运行检查点所需的中间规则的问题。在尝试解决此问题后,我认为问题出在 aggregate_input 函数中的 expand 命令中,但无法弄清楚为什么会这样。

这是我根据 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#data-dependent-conditional-execution 建模的来自 snakemake 的当前检查点文档

rule all:
input:
¦ expand("string_tie_assembly/{sample}.gtf", sample=sample),
¦ expand("combined_fasta/{sample}.fa", sample=sample),
¦ "aggregated_fasta/all_fastas_combined.fa"




checkpoint clustering:
input:
¦ "string_tie_assembly_merged/merged_{sample}.gtf"
output:
¦ clusters = directory("split_gtf_file/{sample}")
shell:
¦ """
¦ mkdir -p split_gtf_file/{wildcards.sample} ;

collapse_gtf_file.py -gtf {input} -o split_gtf_file/{wildcards.sample}/{wildcards.sample}
¦ """

rule gtf_to_fasta:
input:
¦ "split_gtf_file/{sample}/{sample}_{i}.gtf"
output:
¦ "lncRNA_fasta/{sample}/canidate_{sample}_{i}.fa"
shell:
¦ "gffread -w {output} -g {reference} {input}"

rule rename_fasta_files:
input:
¦ "lncRNA_fasta/{sample}/canidate_{sample}_{i}.fa"
output:
¦ "lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa"
shell:
¦ "seqtk rename {input} {wildcards.sample}_{i} > {output}"

#Gather N number of output files from the GTF split
def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{i}.fa")).i)
print(x)
return x

#Aggregate fasta from split GTF files together
rule combine_fasta_file:
input:
¦ aggregate_input
output:
¦ "combined_fasta/{sample}.fa"
shell:
"cat {input} > {output}"


¦ aggregate_input
output:
¦ "combined_fasta/{sample}.fa"
shell:
¦ "cat {input} > {output}"

#Aggegate aggregated fasta files
def gather_files(wildcards):
files = expand("combined_fasta/{sample}.fa", sample=sample)
return(files)

rule aggregate_fasta_files:
input:
¦ gather_files
output:
¦ "aggregated_fasta/all_fastas_combined.fa"
shell:
¦ "cat {input} > {output}"

我一直遇到的问题是,在运行这个 snakemake 文件时,combine_fasta_file 规则不会运行。花了更多时间解决这个错误后,我意识到问题是 aggregate_input 函数没有扩展,并返回一个空列表 [] 而不是我期望的列表目录中所有文件的扩展,即:lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa

这很奇怪,尤其是考虑到 checkpoint clustering 确实运行正常,并且下游输出文件在 rule all

有人知道为什么会这样吗?或者可能是这种情况的原因。

用于运行 snakemake 的命令:snakemake -rs Assemble_regions.snake --configfile snake_config_files/annotated_group_config.yaml

最佳答案

刚刚弄明白了。问题是我的 aggregate 命令针对错误的文件。以前我把它写成

def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{i}.fa")).i)
print(x)
return x

然而,这个问题是针对错误的文件。而不是 globbig {i}.fa,它应该是从 checkpoint clustering 生成的文件。所以将此代码更改为

def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
print(checkpoint_output)
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=wildcards.sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{sample}_{i}.gtf")).i)
print(x)
return x

解决了这个问题。

关于bioinformatics - 执行检查点中间命令,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56280274/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com