nextflow: 在另一个脚本中使用全局变量并使用 .name(创建索引)

huangapple go评论59阅读模式
英文:

nextflow: use global variable and use .name in another script (create index)

问题

我有两个脚本:main.nfindex_process.nf,第三个我创建时没有传递变量 dummy_index_process.nf。我最初通过传递参数来创建索引,但是我了解到可以在全局范围下使用 params.。当我传递变量时,我在 fasta 文件上使用 .name。在输出步骤 {.ann, .etc} 使用全局变量带 .name 时,无法创建索引。

请查看下面的三个代码片段:1)main.nf 2)传递 fasta 到索引 3)在索引中使用全局变量

main.nf

params.outdir_index_temp="./bwa_index_temp"
params.hg38genome = "/Users/username/Downloads/NM.fasta"

include {bwa_index} from './index_process.nf'
include {bwa_index_dummy} from './dummy_index_process.nf'

workflow {
    bwa_index(params.hg38genome)
    bwa_index_dummy()
}

index_process.nf(正常工作)

process bwa_index {

    tag {ref_fasta.name}

    publishDir "$params.outdir_index_temp/", mode:"copy"

    input:
    path ref_fasta

    output:
    tuple val(ref_fasta.name), path("${ref_fasta.name}.{ann,amb,sa,bwt,pac}")

    """
    bwa index "${ref_fasta}"
    """
}

dummy_index_process.nf(输出路径错误)

process bwa_index_dummy {

    tag {{params.hg38genome}.name}

    publishDir "$params.outdir_index_temp/", mode:"copy"

    output:
    tuple val(params.hg38genome), path("${{params.hg38genome}.name}.{ann,amb,sa,bwt,pac}")

    """
    bwa index "${params.hg38genome}"
    """
}

我在 path("${{params.hg38genome}.name}.{ann,amb,sa,bwt,pac}") 部分遇到问题,我无法理解如何使用 .name

我得到的错误是:

ERROR ~ Error executing process > 'bwa_index (null)'

Caused by:
  Missing output file(s) `null.{ann,amb,sa,bwt,pac}` expected by process `bwa_index (null)`

Command executed:

  bwa index "/Users/name/Downloads/NM.fasta"

Command exit status:
  0

Command output:
  (empty)

Command error:
  [bwa_index] Pack FASTA... 0.01 sec
  [bwa_index] Construct BWT for the packed sequence...
  [bwa_index] 0.34 seconds elapse.
  [bwa_index] Update BWT... 0.01 sec
  [bwa_index] Pack forward-only FASTA... 0.01 sec
  [bwa_index] Construct SA from BWT and Occ... 0.11 sec
  [main] Version: 0.7.17-r1188
  [main] CMD: bwa index /Users/name/Downloads/NM.fasta
  [main] Real time: 0.494 sec; CPU: 0.491 sec

Work dir:
  /Users/name/Documents/path/nextflow_scripts/pipeline/work/c1/2916e08cb93e92d175520b9db71976

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
英文:

I have two scripts: main.nf and index_process.nf Third I am creating without passing variable dummy_index_process.nf
I initially create index by passing parameters, however I understood that params. can be used under global scope.

When I pass variable I use .name on the fasta file. I'm unable to create index using global variable with .name at the step of output {.ann, .etc}

Please see below three code snippets: 1) main.nf 2) passing fasta to index 3) using global in index

main.nf

params.outdir_index_temp="./bwa_index_temp"
params.hg38genome ="/Users/username/Downloads/NM.fasta"

include {bwa_index} from './index_process.nf'
include {bwa_index_dummy} from './dummy_index_process.nf'

workflow {
	bwa_index(params.hg38genome)
	bwa_index_dummy()    
}

index_process.nf (works fine)

process bwa_index {

    tag {ref_fasta.name}

    publishDir "$params.outdir_index_temp/", mode:"copy"

    input:
    path ref_fasta

    output:
    tuple val(ref_fasta.name), path("${ref_fasta.name}.{ann,amb,sa,bwt,pac}")

    """
    bwa index "${ref_fasta}"
    """
}

dummy_index_process.nf (error with output path)

process bwa_index_dummy {

    tag {{params.hg38genome}.name}

    publishDir "$params.outdir_index_temp/", mode:"copy"

    output:
    tuple val(params.hg38genome), path("${{params.hg38genome}.name}.{ann,amb,sa,bwt,pac}")

    """
    bwa index "${params.hg38genome}"
    """
}

I am struggling at path("${{params.hg38genome}.name}.{ann,amb,sa,bwt,pac}") I cannot understand how to use .name

Error I get is:

ERROR ~ Error executing process > 'bwa_index (null)'

Caused by:
  Missing output file(s) `null.{ann,amb,sa,bwt,pac}` expected by process `bwa_index (null)`

Command executed:

  bwa index "/Users/name/Downloads/NM.fasta"

Command exit status:
  0

Command output:
  (empty)

Command error:
  [bwa_index] Pack FASTA... 0.01 sec
  [bwa_index] Construct BWT for the packed sequence...
  [bwa_index] 0.34 seconds elapse.
  [bwa_index] Update BWT... 0.01 sec
  [bwa_index] Pack forward-only FASTA... 0.01 sec
  [bwa_index] Construct SA from BWT and Occ... 0.11 sec
  [main] Version: 0.7.17-r1188
  [main] CMD: bwa index /Users/name/Downloads/NM.fasta
  [main] Real time: 0.494 sec; CPU: 0.491 sec

Work dir:
  /Users/name/Documents/path/nextflow_scripts/pipeline/work/c1/2916e08cb93e92d175520b9db71976

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

答案1

得分: 1

这是因为 params.hg38genome 只是一个普通的 java.lang.String(即字符串)。它没有像 file/pathsun.nio.fs.UnixPath)对象那样的 name 属性。如果没有 input 块,Nextflow 在运行作业时将不知道要将任何输入文件传输到进程工作目录中。这意味着在作业运行时不能保证 FASTA 文件可用。在共享/本地文件系统上,使用绝对路径可能不会注意到任何问题。但是 Nextflow 流水线旨在具有可移植性。例如,如果稍后尝试在云中运行该代码,可能会收到 FileNotFound 异常或类似的错误,因为工具会发现文件丢失。至于您的第一段代码,您可能还喜欢:

process bwa_index {

    tag { ref_fasta.name }

    publishDir params.outdir_index_temp, mode:"copy"

    input:
    path ref_fasta

    output:
    tuple val(ref_fasta.name), path("*.{ann,amb,sa,bwt,pac}")

    """
    bwa index "${ref_fasta}"
    """
}
英文:

This is because params.hg38genome is just a regular java.lang.String (i.e. a string). It has no
name attribute like file/path (sun.nio.fs.UnixPath) objects do. Without an input block, Nextflow will not know to stage any input files into the process working directory when the job is run. This will mean that there's no guarantee your FASTA file will be available when the job is run. On a shared/local filesystem, you may not notice any issues with an absolute path. But Nextflow pipelines are intended to be portable. If you later tried to run that code in the cloud for example, you'd likely get a FileNotFound exception or similar when your tool discovers that the file is missing. With regards to your first piece of code, you might also prefer:

process bwa_index {

    tag { ref_fasta.name }

    publishDir params.outdir_index_temp, mode:"copy"

    input:
    path ref_fasta

    output:
    tuple val(ref_fasta.name), path("*.{ann,amb,sa,bwt,pac}")

    """
    bwa index "${ref_fasta}"
    """
}

huangapple
  • 本文由 发表于 2023年6月19日 00:26:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76501516.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定