问题

在这段代码中，all_stats.txt 中的文件列显示为 1.fastq.gz 而不是 barcode01.fastq.gz，而数字 1 似乎是 FIFO 串行号，而不是条形码号码。

要修复代码以正确分配条形码号码，您可以尝试以下更改：

在 process cat 部分的输出路径中，使用条形码作为文件名，而不是数字 1。您可以像这样更改代码：

output:
path("${bc}.fastq.gz")

这将确保生成的文件名以 barcode01.fastq.gz 和 barcode02.fastq.gz 的形式命名，而不是 1.fastq.gz。

完成这些更改后，再次运行代码，应该会生成具有正确条形码号码的文件名的 all_stats.txt 文件。

英文:

def barcodes = (1..2).collect { String.format(&quot;barcode%02d&quot;, it) }
params.orifq = barcodes.collect { &quot;fastq_pass/$it/*.fastq.gz&quot; }

Channel
.fromPath(params.orifq)
.map { it -&gt; [it.name.split(&quot;_&quot;)[2], it] }
.groupTuple()
.set{orifq_ch}

process cat {
debug true

publishDir = [ path: &quot;Run/orifq&quot;, mode: &#39;copy&#39; ]

input:
tuple val(bc), path(fq)

output:
path(&quot;*.fastq.gz&quot;)

&quot;&quot;&quot;
cat ${fq} &gt; ${bc}.fastq.gz
&quot;&quot;&quot;
}

process all_stats {
debug true

publishDir = [ path: &quot;Run/stats&quot;, mode: &#39;copy&#39; ]

input:
path (&quot;*.fastq.gz&quot;)

output:
path (&quot;all_stats.txt&quot;), emit: all_stats

&quot;&quot;&quot;
seqkit stat *.fastq.gz &gt; all_stats.txt
&quot;&quot;&quot;

}
workflow {
cat(orifq_ch)|collect|all_stats|view
{

In this code, process cat generated barcode01.fastq.gz and barcode02.fastq.gz, then all outputs from the precess cat were processed altogether in all_stats.

however, the all_stats.txt result in the file column showed 1.fastq.gz instead of barcode01.fastq.gz and the number 1 seems to be the FIFO serial number not the barcode number.

How to fix the code so the barcode number is correctly assigned?

答案1

得分: 0

Nextflow在使用命名模式声明文件集合时，将重写输入文件名。在这种情况下，提供的命名模式是"*.fastq.gz"。注意，通配符*用于控制分段文件的名称。否则（参见多个输入文件）：

> 当输入具有固定文件名并且进程接收到文件集合时，文件名将附加一个表示其在列表中的序号位置的数字后缀。

然而，重写输入文件名是完全可选的。相反，您可以只是使用常规变量来绑定文件集合。然后可以相应地在您的进程脚本中使用它，例如（未经测试）：

params.outdir = './results';

process cat {

    publishDir "${params.outdir}/orifq", mode: 'copy';

    input:
    tuple val(bc), path(fq)

    output:
    path "${bc}.fastq.gz";

    """
    cat ${fq} > "${bc}.fastq.gz"
    """
}

process all_stats {

    publishDir "${params.outdir}/stats", mode: 'copy';

    input:
    path fastq_files

    output:
    path "all_stats.txt";

    """
    seqkit stat ${fastq_files} > all_stats.txt
    """
}


    Channel.fromPath( params.reads )
        .map { it -> [it.name.split("_")[2], it] }
        .groupTuple()
        .set { orifq_ch }

    ...
}

上述的reads模式将匹配从08到64的条形码。它需要将范围分解成多个模式，并为每个部分使用花括号。

英文:

Nextflow will rewrite input file names when a named pattern is used to declare a collection of files. In this case, the named pattern provided is "*.fastq.gz". Note that the * wildcard is used to control the names of staged files. Otherwise (from multiple input files):

> When the input has a fixed file name and a collection of files is
> received by the process, the file name will be appended with a
> numerical suffix representing its ordinal position in the list.

However, the rewriting of input file names is completely optional. Instead, you can just use a regular variable to bind the collection of files. This can then be used accordingly in your process script, for example (untested):

params.reads = &#39;./fastq_pass/barcode{0[8-9],[1-5][0-9],6[0-4]}/*.fastq.gz&#39;
params.outdir = &#39;./results&#39;


process cat {

    publishDir &quot;${params.outdir}/orifq&quot;, mode: &#39;copy&#39;

    input:
    tuple val(bc), path(fq)

    output:
    path &quot;${bc}.fastq.gz&quot;

    &quot;&quot;&quot;
    cat ${fq} &gt; &quot;${bc}.fastq.gz&quot;
    &quot;&quot;&quot;
}

process all_stats {

    publishDir &quot;${params.outdir}/stats&quot;, mode: &#39;copy&#39;

    input:
    path fastq_files

    output:
    path &quot;all_stats.txt&quot;

    &quot;&quot;&quot;
    seqkit stat ${fastq_files} &gt; all_stats.txt
    &quot;&quot;&quot;
}

workflow {

    Channel.fromPath( params.reads )
        .map { it -&gt; [it.name.split(&quot;_&quot;)[2], it] }
        .groupTuple()
        .set { orifq_ch }

    ...
}

The reads pattern above will match barcodes 08 to 64 inclusive. It requires breaking the range down into multiple patterns and uses curly braces for each part.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Lost coordinate file name in process all outputs altogether.

问题

答案1

How to reuse the same process twice in within the same module in nextflow dsl2, but saving the output in a different name?

路径未被Nextflow检测到

nextflow: 在另一个脚本中使用全局变量并使用 .name（创建索引）

Nextflow 在 GCP 上 – 等待容器错误

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论