2023年6月15日 04:34:56go评论59阅读模式

英文:

nextflow: create index - get path

问题

I apologize for the long code snippet. Here's the translated part:

抱歉，再次发表 Nextflow 帖子。我想创建一个参考基因组的索引。我有两个脚本：main.nf 和 create_index.nf

main.nf

params.hg38genome ="/Users/name/Downloads/NM.fasta"
params.outDir = "./output"

include {create_index} from './index.nf'

workflow {
create_index(params.hg38genome)
}

我在 index.nf 中有以下代码

process create_index {

tag { sample_id }

publishDir "${params.outDir}/hg38index/", mode:"copy"
debug true

input:
path( params.hg38genome)

output:
path("${params.outDir}_refhg38.fai"), emit: hg38_fasta_index

script:

"""
echo "hello $params.hg38genome $params.outDir \n"
bwa index $params.hg38genome

"""
}

我无法获取 sample_id 的任何值。其次，我出现以下错误：

> Caused by: Missing output file(s) ./output_refhg38.fai expected by
> process create_index (null)

如果我运行 bwa 如下：

bwa index input.fasta

我会在 input.fasta 所在位置得到文件。

input.fasta.ann
input.fasta.amb
input.fasta.sa
input.fasta.bwt
input.fasta.pac

如何让 Nextflow 创建一个名为 output 的文件夹，在其中创建 NM.fasta.X（其中 X 是 ann 等）？此外，它不提取 NM.fasta。我尝试使用 ${params.hg38genome}.baseName 但失败了。

英文:

my apologies for another nextflow post. I'd like to create index of a reference genome. I have two scripts: main.nf and create_index.nf

main.nf

params.hg38genome =&quot;/Users/name/Downloads/NM.fasta&quot;
params.outDir = &quot;./output&quot;

include {create_index} from &#39;./index.nf&#39;

workflow {
create_index(params.hg38genome)    
}

I've following code in index.nf

process create_index {

	tag { sample_id }

    publishDir &quot;${params.outDir}/hg38index/&quot;, mode:&quot;copy&quot;
debug true

    input:
path( params.hg38genome)

    output:
    path(&quot;${params.outDir}_refhg38.fai&quot;), emit: hg38_fasta_index

    script:


    &quot;&quot;&quot;
	echo &quot;hello $params.hg38genome $params.outDir  \n&quot;
bwa index $params.hg38genome

    &quot;&quot;&quot;
}

I am unable to get any value in sample_id
Second, I get error as:

> Caused by: Missing output file(s) ./output_refhg38.fai expected by
> process create_index (null)

If I run bwa as:
bwa index input.fasta

I get files as where input.fasta is located.

input.fasta.ann  
input.fasta.amb     
input.fasta.sa  
input.fasta.bwt     
input.fasta.pac

How do I enable nextflow to create folder output and within it NM.fasta.X where X is ann, etc. Also, it doesn't extract NM.fasta I tried with ${params.hg38genome}.baseName but failed

答案1

得分: 1

你收到该错误是因为你在流程的工作目录中声明了一个找不到的文件。请注意，FASTA索引文件（即.fai文件）实际上不是bwa index的输出。你可能在想samtools index，它确实会创建FASTA索引.fai文件。如果你的下一步是对一些reads进行比对，你甚至不需要FASTA索引文件 - 你只需要BWA索引文件。例如：

main.nf的内容：

params.reads = '/Users/name/Downloads/tiny/normal/*_R{1,2}_xxx.fastq.gz'
params.hg38genome = '/Users/name/Downloads/NM.fasta'

include { bwa_index } from './bwa.nf'
include { bwa_mem } from './bwa.nf'


workflow {

    reads = Channel.fromFilePairs( params.reads )

    hg38genome = file( params.hg38genome )

    bwa_index( hg38genome, hg38genome.name )
    bwa_mem( reads, bwa_index.out )

    bwa_mem.out.view()
}

bwa.nf的内容：

process bwa_index {

    input:
    path ref_fasta
    val prefix

    output:
    tuple val(prefix), path("${prefix}.{ann,amb,sa,bwt,pac}")

    """
    bwa index \\
        -p "${prefix}" \\
        "${ref_fasta}"
    """
}

process bwa_mem {

    tag { sample_id }

    input:
    tuple val(sample_id), path(reads)
    tuple val(idxbase), path("bwa_index/*")

    output:
    tuple val(sample_id), path("${sample_id}.aln.bam")

    script:
    def task_cpus = task.cpus > 1 ? task.cpus - 1 : task.cpus

    """
    bwa mem \\
        -t ${task_cpus} \\
        "bwa_index/${idxbase}" \\
        ${reads} |
    samtools view \\
       -1 \\
       -o "${sample_id}.aln.bam" \\
       -
    """
}

nextflow.config的内容：

params {

    outdir = './results'
}

process {

    withName: bwa_index {

        publishDir = [
            path: "${params.outdir}/bwa_index",
            mode: 'copy',
        ]
        cpus = 1
        conda = 'bwakit=0.7.17-dev1'
    }

    withName: bwa_mem {

        publishDir = [
            path: "${params.outdir}/bwa_mem",
            mode: 'copy',
        ]
        cpus = 8
        conda = 'bwakit=0.7.17-dev1'
    }
}

conda {

    enabled = true
}

结果：

$ nextflow run main.nf -ansi-log false
N E X T F L O W  ~  version 23.04.1
Launching `main.nf` [determined_jang] DSL2 - revision: 1f32b172b7
Creating env using conda: bwakit=0.7.17-dev1 [cache /path/to/work/conda/env-c67b42794b99b0cecbbb27e78e7f5fb7]
[f6/10ac19] Submitted process > bwa_index
[7a/76be98] Submitted process > bwa_mem (foo)
[foo, /path/to/work/7a/76be98292ced9ca7e418470227b34a/foo.aln.bam]
[de/9d97f1] Submitted process > bwa_mem (baz)
[baz, /path/to/work/de/9d97f1270464644149ec0b64903a49/baz.aln.bam]
[4a/83079c] Submitted process > bwa_mem (bar)
[bar, /path/to/work/4a/83079c009dd7f28306c1f5108426ff/bar.aln.bam]

英文:

You get that error because you've declared a file that could not be found in your process working directory. Note that the FASTA index file (i.e. the .fai file) is not actually an output of bwa index. You might be thinking of samtools index which does indeed create the FASTA index .fai file. If your next step is to align some reads, you don't even need the FASTA index file - you only need the BWA index files. For example:

Contents of main.nf:

params.reads = &#39;/Users/name/Downloads/tiny/normal/*_R{1,2}_xxx.fastq.gz&#39;
params.hg38genome = &#39;/Users/name/Downloads/NM.fasta&#39;

include { bwa_index } from &#39;./bwa.nf&#39;
include { bwa_mem } from &#39;./bwa.nf&#39;


workflow {

    reads = Channel.fromFilePairs( params.reads )

    hg38genome = file( params.hg38genome )

    bwa_index( hg38genome, hg38genome.name )
    bwa_mem( reads, bwa_index.out )

    bwa_mem.out.view()
}

Contents of bwa.nf:

process bwa_index {

    input:
    path ref_fasta
    val prefix

    output:
    tuple val(prefix), path(&quot;${prefix}.{ann,amb,sa,bwt,pac}&quot;)

    &quot;&quot;&quot;
    bwa index \\
        -p &quot;${prefix}&quot; \\
        &quot;${ref_fasta}&quot;
    &quot;&quot;&quot;
}

process bwa_mem {

    tag { sample_id }

    input:
    tuple val(sample_id), path(reads)
    tuple val(idxbase), path(&quot;bwa_index/*&quot;)

    output:
    tuple val(sample_id), path(&quot;${sample_id}.aln.bam&quot;)

    script:
    def task_cpus = task.cpus &gt; 1 ? task.cpus - 1 : task.cpus

    &quot;&quot;&quot;
    bwa mem \\
        -t ${task_cpus} \\
        &quot;bwa_index/${idxbase}&quot; \\
        ${reads} |
    samtools view \\
       -1 \\
       -o &quot;${sample_id}.aln.bam&quot; \\
       -
    &quot;&quot;&quot;
}

Contents of nextflow.config:

params {

    outdir = &#39;./results&#39;
}

process {

    withName: bwa_index {

        publishDir = [
            path: &quot;${params.outdir}/bwa_index&quot;,
            mode: &#39;copy&#39;,
        ]
        cpus = 1
        conda = &#39;bwakit=0.7.17-dev1&#39;
    }

    withName: bwa_mem {

        publishDir = [
            path: &quot;${params.outdir}/bwa_mem&quot;,
            mode: &#39;copy&#39;,
        ]
        cpus = 8
        conda = &#39;bwakit=0.7.17-dev1&#39;
    }
}

conda {

    enabled = true
}

Results:

$ nextflow run main.nf -ansi-log false
N E X T F L O W  ~  version 23.04.1
Launching `main.nf` [determined_jang] DSL2 - revision: 1f32b172b7
Creating env using conda: bwakit=0.7.17-dev1 [cache /path/to/work/conda/env-c67b42794b99b0cecbbb27e78e7f5fb7]
[f6/10ac19] Submitted process &gt; bwa_index
[7a/76be98] Submitted process &gt; bwa_mem (foo)
[foo, /path/to/work/7a/76be98292ced9ca7e418470227b34a/foo.aln.bam]
[de/9d97f1] Submitted process &gt; bwa_mem (baz)
[baz, /path/to/work/de/9d97f1270464644149ec0b64903a49/baz.aln.bam]
[4a/83079c] Submitted process &gt; bwa_mem (bar)
[bar, /path/to/work/4a/83079c009dd7f28306c1f5108426ff/bar.aln.bam]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

nextflow: 创建索引 – 获取路径

问题

答案1

Nextflow 没有这样的变量: id

如何使用Nextflow dsl2处理来自S3的多个输入（yaml/json）。

在什么情况下已完成的Pod将不会被回收。

why I got Cannot invoke method view() on null object

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论