gpt4 book ai didi

python - Nextflow 在进程间操作变量

转载 作者:行者123 更新时间:2023-12-04 03:43:16 36 4
gpt4 key购买 nike

我正在重新设计一个工作流,基本上它从一个产生多个其他进程的进程开始。最初我在开始我的工作流程之前有变量,所以我制作了这些变量的元组,然后将它作为输入传递给流程。该进程获取每个值,并为元组中的每个值生成一个进程。

但是在我的新架构中,我在 processA 中得到了“元组”。然后 processB 需要将每个值作为输入,并为每个输入生成一个进程。

我的元组看起来像:{"002--002": some_params, "004--004": some_params, etc.}

我目前在 Python 中将这些值作为列表:['052--052', '054--054', '055--055', '059--059', '060-- 060', '066--066']

我想知道如何解析这个 Python 列表,以继续传递一个参数并生成多个进程?

ProcessA 还会创建诸如 somefile_052--052.someextension 之类的文件 - 我基本上想用正确的文件传递正确的变量。

如有任何帮助,我们将不胜感激。

这是一些代码:

这是我需要操作的文件。我需要发送具有相同代码的所有文件以及变量。

> ls
out.barcoded.subreads.bam out.subreads.060--060.bam.pbi out.subreads.090--090.subreadset.xml out.subreads.149--149.bam out.subreads.192--192.bam.pbi out.subreads.249--249.subreadset.xml out.subreads.285--285.bam out.subreads.321--321.bam.pbi out.subreads.479--479.subreadset.xml
out.barcoded.subreads.bam.pbi out.subreads.060--060.subreadset.xml out.subreads.091--091.bam out.subreads.149--149.bam.pbi out.subreads.192--192.subreadset.xml out.subreads.252--252.bam out.subreads.285--285.bam.pbi out.subreads.321--321.subreadset.xml out.subreads.482--482.bam
out.barcoded.subreads.lima.counts out.subreads.066--066.bam out.subreads.091--091.bam.pbi out.subreads.149--149.subreadset.xml out.subreads.227--227.bam out.subreads.252--252.bam.pbi out.subreads.285--285.subreadset.xml out.subreads.454--454.bam out.subreads.482--482.bam.pbi
out.barcoded.subreads.lima.guess out.subreads.066--066.bam.pbi out.subreads.091--091.subreadset.xml out.subreads.172--172.bam out.subreads.227--227.bam.pbi out.subreads.252--252.subreadset.xml out.subreads.303--303.bam out.subreads.454--454.bam.pbi out.subreads.482--482.subreadset.xml
out.barcoded.subreads.lima.report out.subreads.066--066.subreadset.xml out.subreads.107--107.bam out.subreads.172--172.bam.pbi out.subreads.227--227.subreadset.xml out.subreads.259--259.bam out.subreads.303--303.bam.pbi out.subreads.454--454.subreadset.xml out.subreads.489--489.bam
out.barcoded.subreads.lima.summary out.subreads.071--071.bam out.subreads.107--107.bam.pbi out.subreads.172--172.subreadset.xml out.subreads.233--233.bam out.subreads.259--259.bam.pbi out.subreads.303--303.subreadset.xml out.subreads.464--464.bam out.subreads.489--489.bam.pbi
out.barcoded.subreads.subreadset.xml out.subreads.071--071.bam.pbi out.subreads.107--107.subreadset.xml out.subreads.175--175.bam out.subreads.233--233.bam.pbi out.subreads.259--259.subreadset.xml out.subreads.307--307.bam out.subreads.464--464.bam.pbi out.subreads.489--489.subreadset.xml
out.subreads.052--052.bam out.subreads.071--071.subreadset.xml out.subreads.112--112.bam out.subreads.175--175.bam.pbi out.subreads.233--233.subreadset.xml out.subreads.261--261.bam out.subreads.307--307.bam.pbi out.subreads.464--464.subreadset.xml out.subreads.494--494.bam
out.subreads.052--052.bam.pbi out.subreads.082--082.bam out.subreads.112--112.bam.pbi out.subreads.175--175.subreadset.xml out.subreads.235--235.bam out.subreads.261--261.bam.pbi out.subreads.307--307.subreadset.xml out.subreads.468--468.bam out.subreads.494--494.bam.pbi
out.subreads.052--052.subreadset.xml out.subreads.082--082.bam.pbi out.subreads.112--112.subreadset.xml out.subreads.185--185.bam out.subreads.235--235.bam.pbi out.subreads.261--261.subreadset.xml out.subreads.313--313.bam out.subreads.468--468.bam.pbi out.subreads.494--494.subreadset.xml
out.subreads.054--054.bam.pbi out.subreads.082--082.subreadset.xml out.subreads.113--113.bam out.subreads.185--185.bam.pbi out.subreads.235--235.subreadset.xml out.subreads.264--264.bam out.subreads.313--313.bam.pbi out.subreads.468--468.subreadset.xml out.subreads.bam
out.subreads.054--054.subreadset.xml out.subreads.085--085.bam out.subreads.113--113.bam.pbi out.subreads.185--185.subreadset.xml out.subreads.241--241.bam out.subreads.264--264.bam.pbi out.subreads.313--313.subreadset.xml out.subreads.471--471.bam out.subreads.bam.pbi
out.subreads.055--055.bam out.subreads.085--085.bam.pbi out.subreads.113--113.subreadset.xml out.subreads.187--187.bam out.subreads.241--241.bam.pbi out.subreads.264--264.subreadset.xml out.subreads.316--316.bam out.subreads.471--471.bam.pbi out.subreads.json
out.subreads.055--055.bam.pbi out.subreads.085--085.subreadset.xml out.subreads.125--125.bam out.subreads.187--187.bam.pbi out.subreads.241--241.subreadset.xml out.subreads.265--265.bam out.subreads.316--316.bam.pbi out.subreads.471--471.subreadset.xml out.subreads.lima.counts
out.subreads.055--055.subreadset.xml out.subreads.088--088.bam out.subreads.125--125.bam.pbi out.subreads.187--187.subreadset.xml out.subreads.245--245.bam out.subreads.265--265.bam.pbi out.subreads.316--316.subreadset.xml out.subreads.473--473.bam out.subreads.lima.guess
out.subreads.059--059.bam out.subreads.088--088.bam.pbi out.subreads.125--125.subreadset.xml out.subreads.188--188.bam out.subreads.245--245.bam.pbi out.subreads.265--265.subreadset.xml out.subreads.317--317.bam out.subreads.473--473.bam.pbi out.subreads.lima.report
out.subreads.059--059.bam.pbi out.subreads.088--088.subreadset.xml out.subreads.143--143.bam out.subreads.188--188.bam.pbi out.subreads.245--245.subreadset.xml out.subreads.273--273.bam out.subreads.317--317.bam.pbi out.subreads.473--473.subreadset.xml out.subreads.lima.summary
out.subreads.059--059.subreadset.xml out.subreads.090--090.bam out.subreads.143--143.bam.pbi out.subreads.188--188.subreadset.xml out.subreads.249--249.bam out.subreads.273--273.bam.pbi out.subreads.317--317.subreadset.xml out.subreads.479--479.bam out.subreads.subreadset.xml
out.subreads.060--060.bam out.subreads.090--090.bam.pbi out.subreads.143--143.subreadset.xml out.subreads.192--192.bam out.subreads.249--249.bam.pbi out.subreads.273--273.subreadset.xml out.subreads.321--321.bam out.subreads.479--479.bam.pbi

所以我想发送这些文件和这个变量:059--059

out.subreads.059--059.bam
out.subreads.059--059.bam.pbi
out.subreads.059--059.subreadset.xml

目前我在工作流中的代码是:

process procA{
input:
file bc_fasta from bc_fasta_chan

output:
set file("$analysis_config.cell/bam/out.subreads.*"), val("$analysis_config.cell/bam/out.subreads.*") into lima_out

script:
```
// run script to generate the above generated files
```
}

process procB{
input:
set file(bc_bam_file), val(bc_name) from lima_out.flatten()

script:
"""
ls
echo ${bc_bam_file}
"""
}

最佳答案

诀窍是能够以某种方式从文件名中提取分组变量,然后调用 groupTuple .我刚刚使用了一个简单的正则表达式来获取这个变量,但如果需要,您可以实现更复杂的东西:

lima_out = Channel.fromPath( './files/out.subreads.*', relative: true )

subreads_pattern = ~/^out\.subreads\.(\d{3}--\d{3})\..*/

lima_out
.flatten()
.filter { it.name =~ subreads_pattern }
.map { tuple( (it.name =~ subreads_pattern)[0][1], it ) }
.groupTuple(size: 3, sort: true)
.view()

结果:

[489--489, [out.subreads.489--489.bam, out.subreads.489--489.bam.pbi, out.subreads.489--489.subreadset.xml]]
[316--316, [out.subreads.316--316.bam, out.subreads.316--316.bam.pbi, out.subreads.316--316.subreadset.xml]]
...

这是我如何将这些值输入到流程中的示例。我对处理伴随文件(在这种情况下,我们有扩展名为“.bam.pbi”的文件)的偏好是将它们与 BAM 文件放在一起。我只是为此使用一个元组。调用first()在我们的元组上,我们可以获得 BAM。不过,这只是我的偏好。您可以在 pbi 伴随文件的输入元组中有一个单独的文件/路径变量,但您可能不需要在脚本 block 中引用它。

lima_out = Channel.fromPath( './files/out.subreads.*', relative: true )

subreads_pattern = ~/^out\.subreads\.(\d{3}--\d{3})\..*/

lima_out
.flatten()
.filter { it.name =~ subreads_pattern }
.map { tuple( (it.name =~ subreads_pattern)[0][1], it ) }
.groupTuple(size: 3, sort: true)
.map { group_name, files -> tuple( group_name, files[2], files[0..1] ) }
.set { subreads_ch }

process next_process {

input:
tuple val(group), path(subreadset), path(indexed_subreads) from subreads_ch

"""
echo "subreadset XML: ${subreadset}"
echo "subreads BAM: ${indexed_subreads.first()}"
"""
}

关于python - Nextflow 在进程间操作变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65577430/

36 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com