nextflow: 创建索引 – 获取路径

huangapple go评论59阅读模式
英文:

nextflow: create index - get path

问题

I apologize for the long code snippet. Here's the translated part:

抱歉,再次发表 Nextflow 帖子。我想创建一个参考基因组的索引。我有两个脚本:main.nf 和 create_index.nf

main.nf

params.hg38genome ="/Users/name/Downloads/NM.fasta"
params.outDir = "./output"

include {create_index} from './index.nf'

workflow {
create_index(params.hg38genome)
}

我在 index.nf 中有以下代码

process create_index {

tag { sample_id }

publishDir "${params.outDir}/hg38index/", mode:"copy"
debug true

input:
path( params.hg38genome)

output:
path("${params.outDir}_refhg38.fai"), emit: hg38_fasta_index

script:

"""
echo "hello $params.hg38genome $params.outDir \n"
bwa index $params.hg38genome

"""
}

我无法获取 sample_id 的任何值。其次,我出现以下错误:

> Caused by: Missing output file(s) ./output_refhg38.fai expected by
> process create_index (null)

如果我运行 bwa 如下:

bwa index input.fasta

我会在 input.fasta 所在位置得到文件。

input.fasta.ann
input.fasta.amb
input.fasta.sa
input.fasta.bwt
input.fasta.pac

如何让 Nextflow 创建一个名为 output 的文件夹,在其中创建 NM.fasta.X(其中 X 是 ann 等)?此外,它不提取 NM.fasta。我尝试使用 ${params.hg38genome}.baseName 但失败了。

英文:

my apologies for another nextflow post. I'd like to create index of a reference genome. I have two scripts: main.nf and create_index.nf

main.nf

params.hg38genome ="/Users/name/Downloads/NM.fasta"
params.outDir = "./output"

include {create_index} from './index.nf'

workflow {
create_index(params.hg38genome)    
}

I've following code in index.nf

process create_index {

	tag { sample_id }

    publishDir "${params.outDir}/hg38index/", mode:"copy"
debug true

    input:
path( params.hg38genome)

    output:
    path("${params.outDir}_refhg38.fai"), emit: hg38_fasta_index

    script:


    """
	echo "hello $params.hg38genome $params.outDir  \n"
bwa index $params.hg38genome

    """
}

I am unable to get any value in sample_id
Second, I get error as:

> Caused by: Missing output file(s) ./output_refhg38.fai expected by
> process create_index (null)

If I run bwa as:
bwa index input.fasta

I get files as where input.fasta is located.

input.fasta.ann  
input.fasta.amb     
input.fasta.sa  
input.fasta.bwt     
input.fasta.pac  

How do I enable nextflow to create folder output and within it NM.fasta.X where X is ann, etc. Also, it doesn't extract NM.fasta I tried with ${params.hg38genome}.baseName but failed

答案1

得分: 1

你收到该错误是因为你在流程的工作目录中声明了一个找不到的文件。请注意,FASTA索引文件(即.fai文件)实际上不是bwa index的输出。你可能在想samtools index,它确实会创建FASTA索引.fai文件。如果你的下一步是对一些reads进行比对,你甚至不需要FASTA索引文件 - 你只需要BWA索引文件。例如:

main.nf的内容:

params.reads = '/Users/name/Downloads/tiny/normal/*_R{1,2}_xxx.fastq.gz'
params.hg38genome = '/Users/name/Downloads/NM.fasta'

include { bwa_index } from './bwa.nf'
include { bwa_mem } from './bwa.nf'


workflow {

    reads = Channel.fromFilePairs( params.reads )

    hg38genome = file( params.hg38genome )

    bwa_index( hg38genome, hg38genome.name )
    bwa_mem( reads, bwa_index.out )

    bwa_mem.out.view()
}

bwa.nf的内容:

process bwa_index {

    input:
    path ref_fasta
    val prefix

    output:
    tuple val(prefix), path("${prefix}.{ann,amb,sa,bwt,pac}")

    """
    bwa index \\
        -p "${prefix}" \\
        "${ref_fasta}"
    """
}

process bwa_mem {

    tag { sample_id }

    input:
    tuple val(sample_id), path(reads)
    tuple val(idxbase), path("bwa_index/*")

    output:
    tuple val(sample_id), path("${sample_id}.aln.bam")

    script:
    def task_cpus = task.cpus > 1 ? task.cpus - 1 : task.cpus

    """
    bwa mem \\
        -t ${task_cpus} \\
        "bwa_index/${idxbase}" \\
        ${reads} |
    samtools view \\
       -1 \\
       -o "${sample_id}.aln.bam" \\
       -
    """
}

nextflow.config的内容:

params {

    outdir = './results'
}

process {

    withName: bwa_index {

        publishDir = [
            path: "${params.outdir}/bwa_index",
            mode: 'copy',
        ]
        cpus = 1
        conda = 'bwakit=0.7.17-dev1'
    }

    withName: bwa_mem {

        publishDir = [
            path: "${params.outdir}/bwa_mem",
            mode: 'copy',
        ]
        cpus = 8
        conda = 'bwakit=0.7.17-dev1'
    }
}

conda {

    enabled = true
}

结果:

$ nextflow run main.nf -ansi-log false
N E X T F L O W  ~  version 23.04.1
Launching `main.nf` [determined_jang] DSL2 - revision: 1f32b172b7
Creating env using conda: bwakit=0.7.17-dev1 [cache /path/to/work/conda/env-c67b42794b99b0cecbbb27e78e7f5fb7]
[f6/10ac19] Submitted process > bwa_index
[7a/76be98] Submitted process > bwa_mem (foo)
[foo, /path/to/work/7a/76be98292ced9ca7e418470227b34a/foo.aln.bam]
[de/9d97f1] Submitted process > bwa_mem (baz)
[baz, /path/to/work/de/9d97f1270464644149ec0b64903a49/baz.aln.bam]
[4a/83079c] Submitted process > bwa_mem (bar)
[bar, /path/to/work/4a/83079c009dd7f28306c1f5108426ff/bar.aln.bam]
英文:

You get that error because you've declared a file that could not be found in your process working directory. Note that the FASTA index file (i.e. the .fai file) is not actually an output of bwa index. You might be thinking of samtools index which does indeed create the FASTA index .fai file. If your next step is to align some reads, you don't even need the FASTA index file - you only need the BWA index files. For example:

Contents of main.nf:

params.reads = '/Users/name/Downloads/tiny/normal/*_R{1,2}_xxx.fastq.gz'
params.hg38genome = '/Users/name/Downloads/NM.fasta'

include { bwa_index } from './bwa.nf'
include { bwa_mem } from './bwa.nf'


workflow {

    reads = Channel.fromFilePairs( params.reads )

    hg38genome = file( params.hg38genome )

    bwa_index( hg38genome, hg38genome.name )
    bwa_mem( reads, bwa_index.out )

    bwa_mem.out.view()
}

Contents of bwa.nf:

process bwa_index {

    input:
    path ref_fasta
    val prefix

    output:
    tuple val(prefix), path("${prefix}.{ann,amb,sa,bwt,pac}")

    """
    bwa index \\
        -p "${prefix}" \\
        "${ref_fasta}"
    """
}
process bwa_mem {

    tag { sample_id }

    input:
    tuple val(sample_id), path(reads)
    tuple val(idxbase), path("bwa_index/*")

    output:
    tuple val(sample_id), path("${sample_id}.aln.bam")

    script:
    def task_cpus = task.cpus > 1 ? task.cpus - 1 : task.cpus

    """
    bwa mem \\
        -t ${task_cpus} \\
        "bwa_index/${idxbase}" \\
        ${reads} |
    samtools view \\
       -1 \\
       -o "${sample_id}.aln.bam" \\
       -
    """
}

Contents of nextflow.config:

params {

    outdir = './results'
}

process {

    withName: bwa_index {

        publishDir = [
            path: "${params.outdir}/bwa_index",
            mode: 'copy',
        ]
        cpus = 1
        conda = 'bwakit=0.7.17-dev1'
    }

    withName: bwa_mem {

        publishDir = [
            path: "${params.outdir}/bwa_mem",
            mode: 'copy',
        ]
        cpus = 8
        conda = 'bwakit=0.7.17-dev1'
    }
}

conda {

    enabled = true
}

Results:

$ nextflow run main.nf -ansi-log false
N E X T F L O W  ~  version 23.04.1
Launching `main.nf` [determined_jang] DSL2 - revision: 1f32b172b7
Creating env using conda: bwakit=0.7.17-dev1 [cache /path/to/work/conda/env-c67b42794b99b0cecbbb27e78e7f5fb7]
[f6/10ac19] Submitted process > bwa_index
[7a/76be98] Submitted process > bwa_mem (foo)
[foo, /path/to/work/7a/76be98292ced9ca7e418470227b34a/foo.aln.bam]
[de/9d97f1] Submitted process > bwa_mem (baz)
[baz, /path/to/work/de/9d97f1270464644149ec0b64903a49/baz.aln.bam]
[4a/83079c] Submitted process > bwa_mem (bar)
[bar, /path/to/work/4a/83079c009dd7f28306c1f5108426ff/bar.aln.bam]

huangapple
  • 本文由 发表于 2023年6月15日 04:34:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76477346.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定