gpt4 book ai didi

python - Snakemake - 输入文件中的通配符无法从输出文件中确定

转载 作者:行者123 更新时间:2023-12-05 02:16:36 25 4
gpt4 key购买 nike

我是 snakemake 的新手,对 python 也不是很流利(所以很抱歉,这可能是一个非常基本的愚蠢问题):

我目前正在构建一个管道来分析一组 bamfiles atlas .这些 bamfile 位于不同的文件夹中,不应移动到一个公共(public)文件夹中。因此我决定提供一个看起来像这样的样本列表(这只是一个例子,实际上样本可能位于完全不同的驱动器上):

Sample     Path
Sample1 /some/path/to/my/sample/
Sample2 /some/different/path/

并将其加载到我的 config.yaml 中:

sample_file: /path/to/samplelist/samplslist.txt

现在到我的 Snakefile:

import pandas as pd

#define configfile with paths etc.
configfile: "config.yaml"

#read-in dataframe and define Sample and Path
SAMPLES = pd.read_table(config["sample_file"])
BAMFILE = SAMPLES["Sample"]
PATH = SAMPLES["Path"]

rule all:
input:
expand("{path}{sample}.summary.txt", zip, path=PATH, sample=BAMFILE)

#this works like a charm as long as I give the zip-function in the rules 'all' and 'summary':

rule indexBam:
input:
"{path}{sample}.bam"
output:
"{path}{sample}.bam.bai"
shell:
"samtools index {input}"

#this following command works as long as I give the specific folder for a sample instead of {path}.
rule bamdiagnostics:
input:
bam="{path}{sample}.bam",
bai=expand("{path}{sample}.bam.bai", zip, path=PATH, sample=BAMFILE)
params:
prefix="analysis/BAMDiagnostics/{sample}"
output:
"analysis/BAMDiagnostics/{sample}_approximateDepth.txt",
"analysis/BAMDiagnostics/{sample}_fragmentStats.txt",
"analysis/BAMDiagnostics/{sample}_MQ.txt",
"analysis/BAMDiagnostics/{sample}_readLength.txt",
"analysis/BAMDiagnostics/{sample}_BamDiagnostics.log"
message:
"running BamDiagnostics...{wildcards.sample}"
shell:
"{config[atlas]} task=BAMDiagnostics bam={input.bam} out={params.prefix} logFile={params.prefix}_BamDiagnostics.log verbose"

rule summary:
input:
index=expand("{path}{sample}.bam.bai", zip, path=PATH, sample=BAMFILE),
bamd=expand("analysis/BAMDiagnostics/{sample}_approximateDepth.txt", sample=BAMFILE)
output:
"{path}{sample}.summary.txt"
shell:
"echo -e '{input.index} {input.bamd}"

我得到了错误

WildcardError in line 28 of path/to/my/Snakefile: Wildcards in input files cannot be determined from output files: 'path'

谁能帮帮我?
- 我试图通过 join 或创建输入函数来解决这个问题,但我认为我不够熟练,无法看到我的错误...
- 我想问题是,我的摘要规则不包含 bamdiagnostics 输出的带有 {path} 的连音符(因为输出在其他地方)并且无法连接到输入文件左右...
- 扩展我对 bamdiagnostics-rule 的输入使代码工作,但当然将每个样本输入到每个样本输出并造成大困惑: In this case, both bamfiles are used for the creation of each outputfile. This is wrong as the samples AND the output are to be treated independently.

最佳答案

根据 atlas 文档,您似乎需要为每个样本分别运行每个规则,这里的复杂之处在于每个样本都在单独的路径中。

我修改了您的脚本以适用于上述情况(参见 DAG )。修改了脚本开头的变量以使其更有意义。出于演示目的删除了 config,并使用了 pathlib 库(而不是 os.path.join)。 pathlib 不是必需的,但它可以帮助我保持理智。修改了 shell 命令以避免 config

import pandas as pd
from pathlib import Path

df = pd.read_csv('sample.tsv', sep='\t', index_col='Sample')
SAMPLES = df.index
BAM_PATH = df["Path"]
# print (BAM_PATH['sample1'])

rule all:
input:
expand("{path}{sample}.summary.txt", zip, path=BAM_PATH, sample=SAMPLES)


rule indexBam:
input:
str( Path("{path}") / "{sample}.bam")
output:
str( Path("{path}") / "{sample}.bam.bai")
shell:
"samtools index {input}"

#this following command works as long as I give the specific folder for a sample instead of {path}.
rule bamdiagnostics:
input:
bam = lambda wildcards: str( Path(BAM_PATH[wildcards.sample]) / f"{wildcards.sample}.bam"),
bai = lambda wildcards: str( Path(BAM_PATH[wildcards.sample]) / f"{wildcards.sample}.bam.bai"),
params:
prefix="analysis/BAMDiagnostics/{sample}"
output:
"analysis/BAMDiagnostics/{sample}_approximateDepth.txt",
"analysis/BAMDiagnostics/{sample}_fragmentStats.txt",
"analysis/BAMDiagnostics/{sample}_MQ.txt",
"analysis/BAMDiagnostics/{sample}_readLength.txt",
"analysis/BAMDiagnostics/{sample}_BamDiagnostics.log"
message:
"running BamDiagnostics...{wildcards.sample}"
shell:
".atlas task=BAMDiagnostics bam={input.bam} out={params.prefix} logFile={params.prefix}_BamDiagnostics.log verbose"

rule summary:
input:
bamd = "analysis/BAMDiagnostics/{sample}_approximateDepth.txt",
index = lambda wildcards: str( Path(BAM_PATH[wildcards.sample]) / f"{wildcards.sample}.bam.bai"),
output:
str( Path("{path}") / "{sample}.summary.txt")
shell:
"echo -e '{input.index} {input.bamd}"

关于python - Snakemake - 输入文件中的通配符无法从输出文件中确定,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49390202/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com