英文:
nextflow: use global variable and use .name in another script (create index)
问题
我有两个脚本:main.nf
和 index_process.nf
,第三个我创建时没有传递变量 dummy_index_process.nf
。我最初通过传递参数来创建索引,但是我了解到可以在全局范围下使用 params.
。当我传递变量时,我在 fasta 文件上使用 .name
。在输出步骤 {.ann, .etc}
使用全局变量带 .name
时,无法创建索引。
请查看下面的三个代码片段:1)main.nf 2)传递 fasta 到索引 3)在索引中使用全局变量
main.nf
params.outdir_index_temp="./bwa_index_temp"
params.hg38genome = "/Users/username/Downloads/NM.fasta"
include {bwa_index} from './index_process.nf'
include {bwa_index_dummy} from './dummy_index_process.nf'
workflow {
bwa_index(params.hg38genome)
bwa_index_dummy()
}
index_process.nf(正常工作)
process bwa_index {
tag {ref_fasta.name}
publishDir "$params.outdir_index_temp/", mode:"copy"
input:
path ref_fasta
output:
tuple val(ref_fasta.name), path("${ref_fasta.name}.{ann,amb,sa,bwt,pac}")
"""
bwa index "${ref_fasta}"
"""
}
dummy_index_process.nf(输出路径错误)
process bwa_index_dummy {
tag {{params.hg38genome}.name}
publishDir "$params.outdir_index_temp/", mode:"copy"
output:
tuple val(params.hg38genome), path("${{params.hg38genome}.name}.{ann,amb,sa,bwt,pac}")
"""
bwa index "${params.hg38genome}"
"""
}
我在 path("${{params.hg38genome}.name}.{ann,amb,sa,bwt,pac}")
部分遇到问题,我无法理解如何使用 .name
。
我得到的错误是:
ERROR ~ Error executing process > 'bwa_index (null)'
Caused by:
Missing output file(s) `null.{ann,amb,sa,bwt,pac}` expected by process `bwa_index (null)`
Command executed:
bwa index "/Users/name/Downloads/NM.fasta"
Command exit status:
0
Command output:
(empty)
Command error:
[bwa_index] Pack FASTA... 0.01 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.34 seconds elapse.
[bwa_index] Update BWT... 0.01 sec
[bwa_index] Pack forward-only FASTA... 0.01 sec
[bwa_index] Construct SA from BWT and Occ... 0.11 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index /Users/name/Downloads/NM.fasta
[main] Real time: 0.494 sec; CPU: 0.491 sec
Work dir:
/Users/name/Documents/path/nextflow_scripts/pipeline/work/c1/2916e08cb93e92d175520b9db71976
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details
英文:
I have two scripts: main.nf
and index_process.nf
Third I am creating without passing variable dummy_index_process.nf
I initially create index by passing parameters, however I understood that params.
can be used under global scope.
When I pass variable I use .name
on the fasta file. I'm unable to create index using global variable with .name
at the step of output {.ann, .etc}
Please see below three code snippets: 1) main.nf 2) passing fasta to index 3) using global in index
main.nf
params.outdir_index_temp="./bwa_index_temp"
params.hg38genome ="/Users/username/Downloads/NM.fasta"
include {bwa_index} from './index_process.nf'
include {bwa_index_dummy} from './dummy_index_process.nf'
workflow {
bwa_index(params.hg38genome)
bwa_index_dummy()
}
index_process.nf (works fine)
process bwa_index {
tag {ref_fasta.name}
publishDir "$params.outdir_index_temp/", mode:"copy"
input:
path ref_fasta
output:
tuple val(ref_fasta.name), path("${ref_fasta.name}.{ann,amb,sa,bwt,pac}")
"""
bwa index "${ref_fasta}"
"""
}
dummy_index_process.nf (error with output path)
process bwa_index_dummy {
tag {{params.hg38genome}.name}
publishDir "$params.outdir_index_temp/", mode:"copy"
output:
tuple val(params.hg38genome), path("${{params.hg38genome}.name}.{ann,amb,sa,bwt,pac}")
"""
bwa index "${params.hg38genome}"
"""
}
I am struggling at path("${{params.hg38genome}.name}.{ann,amb,sa,bwt,pac}")
I cannot understand how to use .name
Error I get is:
ERROR ~ Error executing process > 'bwa_index (null)'
Caused by:
Missing output file(s) `null.{ann,amb,sa,bwt,pac}` expected by process `bwa_index (null)`
Command executed:
bwa index "/Users/name/Downloads/NM.fasta"
Command exit status:
0
Command output:
(empty)
Command error:
[bwa_index] Pack FASTA... 0.01 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.34 seconds elapse.
[bwa_index] Update BWT... 0.01 sec
[bwa_index] Pack forward-only FASTA... 0.01 sec
[bwa_index] Construct SA from BWT and Occ... 0.11 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index /Users/name/Downloads/NM.fasta
[main] Real time: 0.494 sec; CPU: 0.491 sec
Work dir:
/Users/name/Documents/path/nextflow_scripts/pipeline/work/c1/2916e08cb93e92d175520b9db71976
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details
答案1
得分: 1
这是因为 params.hg38genome
只是一个普通的 java.lang.String
(即字符串)。它没有像 file
/path
(sun.nio.fs.UnixPath
)对象那样的 name
属性。如果没有 input
块,Nextflow 在运行作业时将不知道要将任何输入文件传输到进程工作目录中。这意味着在作业运行时不能保证 FASTA 文件可用。在共享/本地文件系统上,使用绝对路径可能不会注意到任何问题。但是 Nextflow 流水线旨在具有可移植性。例如,如果稍后尝试在云中运行该代码,可能会收到 FileNotFound 异常或类似的错误,因为工具会发现文件丢失。至于您的第一段代码,您可能还喜欢:
process bwa_index {
tag { ref_fasta.name }
publishDir params.outdir_index_temp, mode:"copy"
input:
path ref_fasta
output:
tuple val(ref_fasta.name), path("*.{ann,amb,sa,bwt,pac}")
"""
bwa index "${ref_fasta}"
"""
}
英文:
This is because params.hg38genome
is just a regular java.lang.String
(i.e. a string). It has no
name
attribute like file
/path
(sun.nio.fs.UnixPath
) objects do. Without an input
block, Nextflow will not know to stage any input files into the process working directory when the job is run. This will mean that there's no guarantee your FASTA file will be available when the job is run. On a shared/local filesystem, you may not notice any issues with an absolute path. But Nextflow pipelines are intended to be portable. If you later tried to run that code in the cloud for example, you'd likely get a FileNotFound exception or similar when your tool discovers that the file is missing. With regards to your first piece of code, you might also prefer:
process bwa_index {
tag { ref_fasta.name }
publishDir params.outdir_index_temp, mode:"copy"
input:
path ref_fasta
output:
tuple val(ref_fasta.name), path("*.{ann,amb,sa,bwt,pac}")
"""
bwa index "${ref_fasta}"
"""
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论