英文:
nextflow: create index - get path
问题
I apologize for the long code snippet. Here's the translated part:
抱歉,再次发表 Nextflow 帖子。我想创建一个参考基因组的索引。我有两个脚本:main.nf 和 create_index.nf
main.nf
params.hg38genome ="/Users/name/Downloads/NM.fasta"
params.outDir = "./output"
include {create_index} from './index.nf'
workflow {
create_index(params.hg38genome)
}
我在 index.nf 中有以下代码
process create_index {
tag { sample_id }
publishDir "${params.outDir}/hg38index/", mode:"copy"
debug true
input:
path( params.hg38genome)
output:
path("${params.outDir}_refhg38.fai"), emit: hg38_fasta_index
script:
"""
echo "hello $params.hg38genome $params.outDir \n"
bwa index $params.hg38genome
"""
}
我无法获取 sample_id 的任何值。其次,我出现以下错误:
> Caused by: Missing output file(s) ./output_refhg38.fai
expected by
> process create_index (null)
如果我运行 bwa 如下:
bwa index input.fasta
我会在 input.fasta 所在位置得到文件。
input.fasta.ann
input.fasta.amb
input.fasta.sa
input.fasta.bwt
input.fasta.pac
如何让 Nextflow 创建一个名为 output 的文件夹,在其中创建 NM.fasta.X(其中 X 是 ann 等)?此外,它不提取 NM.fasta。我尝试使用 ${params.hg38genome}.baseName
但失败了。
英文:
my apologies for another nextflow post. I'd like to create index of a reference genome. I have two scripts: main.nf and create_index.nf
main.nf
params.hg38genome ="/Users/name/Downloads/NM.fasta"
params.outDir = "./output"
include {create_index} from './index.nf'
workflow {
create_index(params.hg38genome)
}
I've following code in index.nf
process create_index {
tag { sample_id }
publishDir "${params.outDir}/hg38index/", mode:"copy"
debug true
input:
path( params.hg38genome)
output:
path("${params.outDir}_refhg38.fai"), emit: hg38_fasta_index
script:
"""
echo "hello $params.hg38genome $params.outDir \n"
bwa index $params.hg38genome
"""
}
I am unable to get any value in sample_id
Second, I get error as:
> Caused by: Missing output file(s) ./output_refhg38.fai
expected by
> process create_index (null)
If I run bwa as:
bwa index input.fasta
I get files as where input.fasta
is located.
input.fasta.ann
input.fasta.amb
input.fasta.sa
input.fasta.bwt
input.fasta.pac
How do I enable nextflow to create folder output and within it NM.fasta.X where X is ann, etc. Also, it doesn't extract NM.fasta I tried with ${params.hg38genome}.baseName
but failed
答案1
得分: 1
你收到该错误是因为你在流程的工作目录中声明了一个找不到的文件。请注意,FASTA索引文件(即.fai
文件)实际上不是bwa index
的输出。你可能在想samtools index
,它确实会创建FASTA索引.fai
文件。如果你的下一步是对一些reads进行比对,你甚至不需要FASTA索引文件 - 你只需要BWA索引文件。例如:
main.nf
的内容:
params.reads = '/Users/name/Downloads/tiny/normal/*_R{1,2}_xxx.fastq.gz'
params.hg38genome = '/Users/name/Downloads/NM.fasta'
include { bwa_index } from './bwa.nf'
include { bwa_mem } from './bwa.nf'
workflow {
reads = Channel.fromFilePairs( params.reads )
hg38genome = file( params.hg38genome )
bwa_index( hg38genome, hg38genome.name )
bwa_mem( reads, bwa_index.out )
bwa_mem.out.view()
}
bwa.nf
的内容:
process bwa_index {
input:
path ref_fasta
val prefix
output:
tuple val(prefix), path("${prefix}.{ann,amb,sa,bwt,pac}")
"""
bwa index \\
-p "${prefix}" \\
"${ref_fasta}"
"""
}
process bwa_mem {
tag { sample_id }
input:
tuple val(sample_id), path(reads)
tuple val(idxbase), path("bwa_index/*")
output:
tuple val(sample_id), path("${sample_id}.aln.bam")
script:
def task_cpus = task.cpus > 1 ? task.cpus - 1 : task.cpus
"""
bwa mem \\
-t ${task_cpus} \\
"bwa_index/${idxbase}" \\
${reads} |
samtools view \\
-1 \\
-o "${sample_id}.aln.bam" \\
-
"""
}
nextflow.config
的内容:
params {
outdir = './results'
}
process {
withName: bwa_index {
publishDir = [
path: "${params.outdir}/bwa_index",
mode: 'copy',
]
cpus = 1
conda = 'bwakit=0.7.17-dev1'
}
withName: bwa_mem {
publishDir = [
path: "${params.outdir}/bwa_mem",
mode: 'copy',
]
cpus = 8
conda = 'bwakit=0.7.17-dev1'
}
}
conda {
enabled = true
}
结果:
$ nextflow run main.nf -ansi-log false
N E X T F L O W ~ version 23.04.1
Launching `main.nf` [determined_jang] DSL2 - revision: 1f32b172b7
Creating env using conda: bwakit=0.7.17-dev1 [cache /path/to/work/conda/env-c67b42794b99b0cecbbb27e78e7f5fb7]
[f6/10ac19] Submitted process > bwa_index
[7a/76be98] Submitted process > bwa_mem (foo)
[foo, /path/to/work/7a/76be98292ced9ca7e418470227b34a/foo.aln.bam]
[de/9d97f1] Submitted process > bwa_mem (baz)
[baz, /path/to/work/de/9d97f1270464644149ec0b64903a49/baz.aln.bam]
[4a/83079c] Submitted process > bwa_mem (bar)
[bar, /path/to/work/4a/83079c009dd7f28306c1f5108426ff/bar.aln.bam]
英文:
You get that error because you've declared a file that could not be found in your process working directory. Note that the FASTA index file (i.e. the .fai
file) is not actually an output of bwa index
. You might be thinking of samtools index
which does indeed create the FASTA index .fai
file. If your next step is to align some reads, you don't even need the FASTA index file - you only need the BWA index files. For example:
Contents of main.nf
:
params.reads = '/Users/name/Downloads/tiny/normal/*_R{1,2}_xxx.fastq.gz'
params.hg38genome = '/Users/name/Downloads/NM.fasta'
include { bwa_index } from './bwa.nf'
include { bwa_mem } from './bwa.nf'
workflow {
reads = Channel.fromFilePairs( params.reads )
hg38genome = file( params.hg38genome )
bwa_index( hg38genome, hg38genome.name )
bwa_mem( reads, bwa_index.out )
bwa_mem.out.view()
}
Contents of bwa.nf
:
process bwa_index {
input:
path ref_fasta
val prefix
output:
tuple val(prefix), path("${prefix}.{ann,amb,sa,bwt,pac}")
"""
bwa index \\
-p "${prefix}" \\
"${ref_fasta}"
"""
}
process bwa_mem {
tag { sample_id }
input:
tuple val(sample_id), path(reads)
tuple val(idxbase), path("bwa_index/*")
output:
tuple val(sample_id), path("${sample_id}.aln.bam")
script:
def task_cpus = task.cpus > 1 ? task.cpus - 1 : task.cpus
"""
bwa mem \\
-t ${task_cpus} \\
"bwa_index/${idxbase}" \\
${reads} |
samtools view \\
-1 \\
-o "${sample_id}.aln.bam" \\
-
"""
}
Contents of nextflow.config
:
params {
outdir = './results'
}
process {
withName: bwa_index {
publishDir = [
path: "${params.outdir}/bwa_index",
mode: 'copy',
]
cpus = 1
conda = 'bwakit=0.7.17-dev1'
}
withName: bwa_mem {
publishDir = [
path: "${params.outdir}/bwa_mem",
mode: 'copy',
]
cpus = 8
conda = 'bwakit=0.7.17-dev1'
}
}
conda {
enabled = true
}
Results:
$ nextflow run main.nf -ansi-log false
N E X T F L O W ~ version 23.04.1
Launching `main.nf` [determined_jang] DSL2 - revision: 1f32b172b7
Creating env using conda: bwakit=0.7.17-dev1 [cache /path/to/work/conda/env-c67b42794b99b0cecbbb27e78e7f5fb7]
[f6/10ac19] Submitted process > bwa_index
[7a/76be98] Submitted process > bwa_mem (foo)
[foo, /path/to/work/7a/76be98292ced9ca7e418470227b34a/foo.aln.bam]
[de/9d97f1] Submitted process > bwa_mem (baz)
[baz, /path/to/work/de/9d97f1270464644149ec0b64903a49/baz.aln.bam]
[4a/83079c] Submitted process > bwa_mem (bar)
[bar, /path/to/work/4a/83079c009dd7f28306c1f5108426ff/bar.aln.bam]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论