英文:
Lost coordinate file name in process all outputs altogether
问题
在这段代码中,all_stats.txt
中的文件列显示为 1.fastq.gz
而不是 barcode01.fastq.gz
,而数字 1 似乎是 FIFO 串行号,而不是条形码号码。
要修复代码以正确分配条形码号码,您可以尝试以下更改:
在 process cat
部分的输出路径中,使用条形码作为文件名,而不是数字 1。您可以像这样更改代码:
output:
path("${bc}.fastq.gz")
这将确保生成的文件名以 barcode01.fastq.gz
和 barcode02.fastq.gz
的形式命名,而不是 1.fastq.gz
。
完成这些更改后,再次运行代码,应该会生成具有正确条形码号码的文件名的 all_stats.txt
文件。
英文:
def barcodes = (1..2).collect { String.format("barcode%02d", it) }
params.orifq = barcodes.collect { "fastq_pass/$it/*.fastq.gz" }
Channel
.fromPath(params.orifq)
.map { it -> [it.name.split("_")[2], it] }
.groupTuple()
.set{orifq_ch}
process cat {
debug true
publishDir = [ path: "Run/orifq", mode: 'copy' ]
input:
tuple val(bc), path(fq)
output:
path("*.fastq.gz")
"""
cat ${fq} > ${bc}.fastq.gz
"""
}
process all_stats {
debug true
publishDir = [ path: "Run/stats", mode: 'copy' ]
input:
path ("*.fastq.gz")
output:
path ("all_stats.txt"), emit: all_stats
"""
seqkit stat *.fastq.gz > all_stats.txt
"""
}
workflow {
cat(orifq_ch)|collect|all_stats|view
{
In this code, process cat generated barcode01.fastq.gz and barcode02.fastq.gz, then all outputs from the precess cat were processed altogether in all_stats.
however, the all_stats.txt result in the file column showed 1.fastq.gz instead of barcode01.fastq.gz and the number 1 seems to be the FIFO serial number not the barcode number.
How to fix the code so the barcode number is correctly assigned?
答案1
得分: 0
Nextflow在使用命名模式声明文件集合时,将重写输入文件名。在这种情况下,提供的命名模式是"*.fastq.gz"
。注意,通配符*
用于控制分段文件的名称。否则(参见多个输入文件):
> 当输入具有固定文件名并且进程接收到文件集合时,文件名将附加一个表示其在列表中的序号位置的数字后缀。
然而,重写输入文件名是完全可选的。相反,您可以只是使用常规变量来绑定文件集合。然后可以相应地在您的进程脚本中使用它,例如(未经测试):
params.outdir = './results';
process cat {
publishDir "${params.outdir}/orifq", mode: 'copy';
input:
tuple val(bc), path(fq)
output:
path "${bc}.fastq.gz";
"""
cat ${fq} > "${bc}.fastq.gz"
"""
}
process all_stats {
publishDir "${params.outdir}/stats", mode: 'copy';
input:
path fastq_files
output:
path "all_stats.txt";
"""
seqkit stat ${fastq_files} > all_stats.txt
"""
}
Channel.fromPath( params.reads )
.map { it -> [it.name.split("_")[2], it] }
.groupTuple()
.set { orifq_ch }
...
}
上述的reads模式将匹配从08到64的条形码。它需要将范围分解成多个模式,并为每个部分使用花括号。
英文:
Nextflow will rewrite input file names when a named pattern is used to declare a collection of files. In this case, the named pattern provided is "*.fastq.gz"
. Note that the *
wildcard is used to control the names of staged files. Otherwise (from multiple input files):
> When the input has a fixed file name and a collection of files is
> received by the process, the file name will be appended with a
> numerical suffix representing its ordinal position in the list.
However, the rewriting of input file names is completely optional. Instead, you can just use a regular variable to bind the collection of files. This can then be used accordingly in your process script, for example (untested):
params.reads = './fastq_pass/barcode{0[8-9],[1-5][0-9],6[0-4]}/*.fastq.gz'
params.outdir = './results'
process cat {
publishDir "${params.outdir}/orifq", mode: 'copy'
input:
tuple val(bc), path(fq)
output:
path "${bc}.fastq.gz"
"""
cat ${fq} > "${bc}.fastq.gz"
"""
}
process all_stats {
publishDir "${params.outdir}/stats", mode: 'copy'
input:
path fastq_files
output:
path "all_stats.txt"
"""
seqkit stat ${fastq_files} > all_stats.txt
"""
}
workflow {
Channel.fromPath( params.reads )
.map { it -> [it.name.split("_")[2], it] }
.groupTuple()
.set { orifq_ch }
...
}
The reads pattern above will match barcodes 08 to 64 inclusive. It requires breaking the range down into multiple patterns and uses curly braces for each part.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论