Lost coordinate file name in process all outputs altogether.

huangapple go评论55阅读模式
英文:

Lost coordinate file name in process all outputs altogether

问题

在这段代码中,all_stats.txt 中的文件列显示为 1.fastq.gz 而不是 barcode01.fastq.gz,而数字 1 似乎是 FIFO 串行号,而不是条形码号码。

要修复代码以正确分配条形码号码,您可以尝试以下更改:

process cat 部分的输出路径中,使用条形码作为文件名,而不是数字 1。您可以像这样更改代码:

output:
path("${bc}.fastq.gz")

这将确保生成的文件名以 barcode01.fastq.gzbarcode02.fastq.gz 的形式命名,而不是 1.fastq.gz

完成这些更改后,再次运行代码,应该会生成具有正确条形码号码的文件名的 all_stats.txt 文件。

英文:
def barcodes = (1..2).collect { String.format("barcode%02d", it) }
params.orifq = barcodes.collect { "fastq_pass/$it/*.fastq.gz" }

Channel
.fromPath(params.orifq)
.map { it -> [it.name.split("_")[2], it] }
.groupTuple()
.set{orifq_ch}

process cat {
debug true

publishDir = [ path: "Run/orifq", mode: 'copy' ]

input:
tuple val(bc), path(fq)

output:
path("*.fastq.gz")

"""
cat ${fq} > ${bc}.fastq.gz
"""
}

process all_stats {
debug true

publishDir = [ path: "Run/stats", mode: 'copy' ]

input:
path ("*.fastq.gz")

output:
path ("all_stats.txt"), emit: all_stats

"""
seqkit stat *.fastq.gz > all_stats.txt
"""

}
workflow {
cat(orifq_ch)|collect|all_stats|view
{

In this code, process cat generated barcode01.fastq.gz and barcode02.fastq.gz, then all outputs from the precess cat were processed altogether in all_stats.

however, the all_stats.txt result in the file column showed 1.fastq.gz instead of barcode01.fastq.gz and the number 1 seems to be the FIFO serial number not the barcode number.

How to fix the code so the barcode number is correctly assigned?

答案1

得分: 0

Nextflow在使用命名模式声明文件集合时,将重写输入文件名。在这种情况下,提供的命名模式是"*.fastq.gz"。注意,通配符*用于控制分段文件的名称。否则(参见多个输入文件):

> 当输入具有固定文件名并且进程接收到文件集合时,文件名将附加一个表示其在列表中的序号位置的数字后缀。

然而,重写输入文件名是完全可选的。相反,您可以只是使用常规变量来绑定文件集合。然后可以相应地在您的进程脚本中使用它,例如(未经测试):

params.outdir = './results';

process cat {

    publishDir "${params.outdir}/orifq", mode: 'copy';

    input:
    tuple val(bc), path(fq)

    output:
    path "${bc}.fastq.gz";

    """
    cat ${fq} > "${bc}.fastq.gz"
    """
}

process all_stats {

    publishDir "${params.outdir}/stats", mode: 'copy';

    input:
    path fastq_files

    output:
    path "all_stats.txt";

    """
    seqkit stat ${fastq_files} > all_stats.txt
    """
}

    Channel.fromPath( params.reads )
        .map { it -> [it.name.split("_")[2], it] }
        .groupTuple()
        .set { orifq_ch }

    ...
}

上述的reads模式将匹配从08到64的条形码。它需要将范围分解成多个模式,并为每个部分使用花括号。

英文:

Nextflow will rewrite input file names when a named pattern is used to declare a collection of files. In this case, the named pattern provided is "*.fastq.gz". Note that the * wildcard is used to control the names of staged files. Otherwise (from multiple input files):

> When the input has a fixed file name and a collection of files is
> received by the process, the file name will be appended with a
> numerical suffix representing its ordinal position in the list.

However, the rewriting of input file names is completely optional. Instead, you can just use a regular variable to bind the collection of files. This can then be used accordingly in your process script, for example (untested):

params.reads = './fastq_pass/barcode{0[8-9],[1-5][0-9],6[0-4]}/*.fastq.gz'
params.outdir = './results'


process cat {

    publishDir "${params.outdir}/orifq", mode: 'copy'

    input:
    tuple val(bc), path(fq)

    output:
    path "${bc}.fastq.gz"

    """
    cat ${fq} > "${bc}.fastq.gz"
    """
}

process all_stats {

    publishDir "${params.outdir}/stats", mode: 'copy'

    input:
    path fastq_files

    output:
    path "all_stats.txt"

    """
    seqkit stat ${fastq_files} > all_stats.txt
    """
}
workflow {

    Channel.fromPath( params.reads )
        .map { it -> [it.name.split("_")[2], it] }
        .groupTuple()
        .set { orifq_ch }

    ...
}

The reads pattern above will match barcodes 08 to 64 inclusive. It requires breaking the range down into multiple patterns and uses curly braces for each part.

huangapple
  • 本文由 发表于 2023年6月15日 16:37:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76480639.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定