gpt4 book ai didi

python-3.x - 单个规则 Snakemake 文件中的多个输入和输出

转载 作者:行者123 更新时间:2023-12-02 12:40:54 25 4
gpt4 key购买 nike

我正在开始使用 Snakemake,我有一个非常基本的问题,我在 Snakemake 教程中找不到答案。

我想创建一个单一规则的snakefile来在linux中逐个下载多个文件。输出中不能使用“expand”,因为文件需要一一下载,并且不能使用通配符,因为它是目标规则。

我想到的唯一方法是这样的方法,但它不能正常工作。我不知道如何使用 {output} 将下载的项目发送到具有特定名称的特定目录,例如“downloaded_files.dwn”,以便在后续步骤中使用:

links=[link1,link2,link3,....]
rule download:
output:
"outdir/{downloaded_file}.dwn"
params:
shellCallFile='callscript',
run:
callString=''
for item in links:
callString+='wget str(item) -O '+{output}+'\n'
call('echo "' + callString + '\n" >> ' + params.shellCallFile, shell=True)
call(callString, shell=True)

我很感激任何关于如何解决这个问题以及我不太理解蛇形制作的哪一部分的提示。

最佳答案

这是一个带注释的示例,可以帮助您解决问题:

# Create some way of associating output files with links
# The output file names will be built from the keys: "chain_{key}.gz"
# One could probably directly use output file names as keys
links = {
"1" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz",
"2" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz",
"3" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz"}


rule download:
output:
# We inform snakemake that this rule will generate
# the following list of files:
# ["outdir/chain_1.gz", "outdir/chain_2.gz", "outdir/chain_3.gz"]
# Note that we don't need to use {output} in the "run" or "shell" part.
# This list will be used if we later add rules
# that use the files generated by the present rule.
expand("outdir/chain_{n}.gz", n=links.keys())
run:
# The sort is there to ensure the files are in the 1, 2, 3 order.
# We could use an OrderedDict if we wanted an arbitrary order.
for link_num in sorted(links.keys()):
shell("wget {link} -O outdir/chain_{n}.gz".format(link=links[link_num], n=link_num))

这是另一种方法,它对下载的文件使用任意名称并使用输出(尽管有点人为):

links = [
("foo_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz"),
("bar_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz"),
("baz_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz")]


rule download:
output:
# We inform snakemake that this rule will generate
# the following list of files:
# ["outdir/foo_chain.gz", "outdir/bar_chain.gz", "outdir/baz_chain.gz"]
["outdir/{f}".format(f=filename) for (filename, _) in links]
run:
for i in range(len(links)):
# output is a list, so we can access its items by index
shell("wget {link} -O {chain_file}".format(
link=links[i][1], chain_file=output[i]))
# using a direct loop over the pairs (filename, link)
# could be considered "cleaner"
# for (filename, link) in links:
# shell("wget {link} -0 outdir/{filename}".format(
# link=link, filename=filename))

可以使用snakemake -j 3并行完成三个下载的示例:

# To use os.path.join,
# which is more robust than manually writing the separator.
import os

# Association between output files and source links
links = {
"foo_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz",
"bar_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz",
"baz_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz"}


# Make this association accessible via a function of wildcards
def chainfile2link(wildcards):
return links[wildcards.chainfile]


# First rule will drive the rest of the workflow
rule all:
input:
# expand generates the list of the final files we want
expand(os.path.join("outdir", "{chainfile}"), chainfile=links.keys())


rule download:
output:
# We inform snakemake what this rule will generate
os.path.join("outdir", "{chainfile}")
params:
# using a function of wildcards in params
link = chainfile2link,
shell:
"""
wget {params.link} -O {output}
"""

关于python-3.x - 单个规则 Snakemake 文件中的多个输入和输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44561183/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com