python-3.x - 单个规则 Snakemake 文件中的多个输入和输出-6ren

python-3.x - 单个规则 Snakemake 文件中的多个输入和输出

转载作者：行者123 更新时间：2023-12-02 12:40:54

25

4

我正在开始使用 Snakemake，我有一个非常基本的问题，我在 Snakemake 教程中找不到答案。

我想创建一个单一规则的snakefile来在linux中逐个下载多个文件。输出中不能使用“expand”，因为文件需要一一下载，并且不能使用通配符，因为它是目标规则。

我想到的唯一方法是这样的方法，但它不能正常工作。我不知道如何使用 {output} 将下载的项目发送到具有特定名称的特定目录，例如“downloaded_files.dwn”，以便在后续步骤中使用:

links=[link1,link2,link3,....]
rule download:    
output: 
    "outdir/{downloaded_file}.dwn"
params: 
    shellCallFile='callscript',
run: 
    callString=''
    for item in links:
        callString+='wget str(item) -O '+{output}+'\n'
    call('echo "' + callString + '\n" >> ' + params.shellCallFile, shell=True)
    call(callString, shell=True)

我很感激任何关于如何解决这个问题以及我不太理解蛇形制作的哪一部分的提示。

最佳答案

这是一个带注释的示例，可以帮助您解决问题:

# Create some way of associating output files with links
# The output file names will be built from the keys: "chain_{key}.gz"
# One could probably directly use output file names as keys 
links = {
    "1" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz",
    "2" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz",
    "3" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz"}


rule download:
    output:
        # We inform snakemake that this rule will generate
        # the following list of files:
        # ["outdir/chain_1.gz", "outdir/chain_2.gz", "outdir/chain_3.gz"]
        # Note that we don't need to use {output} in the "run" or "shell" part.
        # This list will be used if we later add rules
        # that use the files generated by the present rule.
        expand("outdir/chain_{n}.gz", n=links.keys())
    run:
        # The sort is there to ensure the files are in the 1, 2, 3 order.
        # We could use an OrderedDict if we wanted an arbitrary order.
        for link_num in sorted(links.keys()):
            shell("wget {link} -O outdir/chain_{n}.gz".format(link=links[link_num], n=link_num))

这是另一种方法，它对下载的文件使用任意名称并使用输出(尽管有点人为):

links = [
    ("foo_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz"),
    ("bar_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz"),
    ("baz_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz")]


rule download:
    output:
        # We inform snakemake that this rule will generate
        # the following list of files:
        # ["outdir/foo_chain.gz", "outdir/bar_chain.gz", "outdir/baz_chain.gz"]
        ["outdir/{f}".format(f=filename) for (filename, _) in links]
    run:
        for i in range(len(links)):
            # output is a list, so we can access its items by index
            shell("wget {link} -O {chain_file}".format(
                link=links[i][1], chain_file=output[i]))
        # using a direct loop over the pairs (filename, link)
        # could be considered "cleaner"
        # for (filename, link) in links:
        #     shell("wget {link} -0 outdir/{filename}".format(
        #         link=link, filename=filename))

可以使用snakemake -j 3并行完成三个下载的示例:

# To use os.path.join,
# which is more robust than manually writing the separator.
import os

# Association between output files and source links
links = {
    "foo_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz",
    "bar_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz",
    "baz_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz"}


# Make this association accessible via a function of wildcards
def chainfile2link(wildcards):
    return links[wildcards.chainfile]


# First rule will drive the rest of the workflow
rule all:
    input:
        # expand generates the list of the final files we want
        expand(os.path.join("outdir", "{chainfile}"), chainfile=links.keys())


rule download:
    output:
        # We inform snakemake what this rule will generate
        os.path.join("outdir", "{chainfile}")
    params:
        # using a function of wildcards in params
        link = chainfile2link,
    shell:
        """
        wget {params.link} -O {output}
        """

关于python-3.x - 单个规则 Snakemake 文件中的多个输入和输出，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44561183/

25

4

0

文章推荐： javascript - 将数组的数组转换为对象的数组

文章推荐： java - Eclipse java 项目构建时间长

文章推荐： powershell - 将按钮添加到 NotifyIcon

snakemake - 如何在 snakemake 中将变量值作为输入传递？
我想使用 Snakemake 使用 SRR ID 从 SRA 数据库下载 fastq 文件。我读取了一个文件以使用 python 代码获取 SRR ID。我想一个一个解析Variable作为输入。我
snakemake - 在 snakemake 运行期间动态减少输入文件集
这更多是关于 snakemake 功能的技术问题。我想知道是否可以在 snakemake 运行期间动态更改输入样本集。我想这样做的原因如下:让我们假设一组样本相关的 bam 文件。第一条规则确定每个
snakemake - 根据提供给 snakemake 管道的参数有条件地执行一个或另一个规则
我正在创建一个 snakemake 管道，在某些时候，我可以在其中过滤我的结果。但是我可以应用两种过滤器，所以我想在启动管道时将其作为参数提供，然后根据参数，我想应用一个或另一个规则。举个例子: s
snakemake - Snakemake 文件中存在多个 "params"
我有以下 Snakemake 文件: rule test: params: a = "a" shell: "echo {params.a}" 按预期工作
snakemake - Snakemake 文件中存在多个 "params"
我有以下 Snakemake 文件: rule test: params: a = "a" shell: "echo {params.a}" 按预期工作
snakemake - 理解和克服 snakemake 中的 AmbiguousRuleException
我有一个复杂的工作流程，我逐渐扩展了它。最后一个扩展导致 AmbiguousRuleException。我试图在以下示例中重现工作流的关键结构: NUMBERS = ["1", "2"] LETTER
snakemake - 当文件数量已知时，如何在 snakemake 中实现文件拆分
上下文规则 A 在 shell 指令中使用 split 命令。rule A 生成的文件数量取决于用户在配置中指定的值，因此是已知的。在this question存在差异，因为输出文件的数量未知，但
snakemake - 尝试为工作流 [Snakemake] 创建目录时出现 ChildIOException
我正在尝试创建一种简单的方法来在一个规则中创建工作流所需的所有子目录。但是，每当我尝试执行在工作流顶部创建所有必需目录的规则时，我都会收到 ChildIOException ，这对我来说毫无意义: B
snakemake - Snakemake 中不同(已知)的输出数量
我有一个 Snakemake 规则，适用于数据存档并本质上解压其中的数据。文件包含我在规则开始之前知道的不同数量的文件，因此我想利用它并执行类似的操作 rule unpack: input:
snakemake - 从 snakemake 记录执行的 shell 命令
我想将每个 snakemake 作业执行的 shell 命令保存到日志文件中。使用 --printshellcmds 我可以在提交时将 shell 命令打印到标准输出，但我想将它们保存到单独的文件中
snakemake - 使用 snakemake 打印简化的 DAG 图
我有一个很长的蛇形工作流程，处理 9 个具有许多并行规则的样本。当我为 DAG 创建图片时: snakemake --forceall --dag | dot -Tpdf > dag.pdf 生成的
python - snakemake 集群脚本 ImportError snakemake.utils
我有一个奇怪的问题，它来来去去，我真的不知道什么时候以及为什么。我正在运行这样的蛇形管道: conda activate $myEnv snakemake -s $snakefile --co
snakemake - 如何访问 `shell` 部分内的 Snakemake 配置变量？
在snakemake 中，我想从config 访问 key 。从内部shell:指示。我可以用 {input.foo} , {output.bar} , 和 {params.baz} ，但是 {con
snakemake - snakemake 中的 ambiguousruleexception，两个分支的 parms 崩溃
在我的第一次运行中，我有两类样本要由不同的参数处理，然后在第二次运行中将它们合并在一起。像下面的例子: SAMPLES = ['1', '2', '3'] CLASS1 = ['1', '2'] CL
snakemake - 在 AWS Batch 中使用 Snakemake 工作流程
我想问 Snakemake 社区是否有人在 AWS Batch 中成功实现了 Snakemake 工作流程。 2018 年 10 月最近发布的第 4 页似乎表明 Snakemake 在 AWS 上不起
python - Snakemake 使用脚本进行 shell I/O 重定向和访问 Snakemake 变量
问题很简单: 我想从规则调用脚本，并且我希望该规则同时适用于: 执行 stdout 和 stderr 重定向从脚本中访问snakemake变量(变量可以是列表和文字) 如果我使用 shell:，那么
snakemake - 如何忽略 Snakemake 的 "params have changed since last execution"？
由于 conda 环境未处于事件状态，工作流的一些非常晚的作业崩溃了。现在，当我尝试使用 snakemake deploy_all --ignore-incomplete 重新运行时，所有作业都直接
docker - Snakemake 奇点与本地资源/关于 Snakemake 与 --use-singularity 的问题
我开始尝试在 Snakemake 中使用容器，我有一个问题，什么需要预先构建到容器中，什么不需要。例如: 我想在一个容器中运行一个 python 脚本(例如，存储在 workflow_root/scr
snakemake - 处理snakemake中的SIGPIPE错误
以下蛇形脚本: rule all: input: 'test.done' rule pipe: output: 'test.done' shell:
snakemake:有没有办法为每个规则指定一个输出目录？
我使用的所有脚本都将输出文件放在调用脚本的当前目录中，因此在我的 shell 脚本管道中，我会让 cd 命令转到特定目录以运行命令，而输出文件将仅保存在相关目录中。我的脚本没有输出目录的参数，大多数脚

首页

博学

6Ren·AI

商城

python-3.x - 单个规则 Snakemake 文件中的多个输入和输出