问题

关于Nextflow配置中的cpus设置，这个设置是指实际的CPU核心数还是线程数？

cpus 设置表示实际的CPU核心数，而不是线程数。

这个设置的意思是，无论有多少并行作业正在运行，系统都不会超过这个设置吗？还是我必须在资源枯竭的过程中设置并行数？

这个设置的意思是，每个流程（process）将使用指定数量的CPU核心，但不会限制系统中同时运行的并行作业数量。如果你希望限制同时运行的作业数量，需要使用 Nextflow 的资源管理功能，例如使用 maxForks 来控制并行作业的数量。

英文:

I am confused about the Nextflow config cups setting, is this cpus number the actual CPU number or thread number?

process {

    executor = &#39;local&#39;

    cpus = 24
    memory = 128.GB
    time = 6.h

    withName: CENTRIFUGE {
        cpus = 32
        memory = 384.GB
        time = 3.h
    }
}

Does this setting means, no mater how many parallel jobs are running, the system would never exceed this setting? Or do I have to set the parallelization number in a resources exhausting process?

def barcodes = (1..20).collect { String.format(&quot;barcode%02d&quot;, it) }
params.infq = barcodes.collect { &quot;../fastq_pass/$it/*.fastq.gz&quot; }
params.outdir = &quot;Analysis&quot;
params.hostref = &quot;/mnt/genomic.fna&quot;
params.targetref = &quot;/mnt/NTM.fasta&quot;
params.db = &quot;/mnt/db&quot;

process CAT {
    debug true
    publishDir &quot;${params.outdir}/orifq&quot;, mode: &#39;copy&#39;

    input:
    tuple val(bc), path(fq)

    output:
    path &quot;${bc}.fastq.gz&quot;

    &quot;&quot;&quot;
    cat ${fq} &gt; ${bc}.fastq.gz
    &quot;&quot;&quot;
}

process ALL_STATS {
    debug true
    publishDir &quot;${params.outdir}/stats&quot;, mode: &#39;copy&#39;

    input:
    path orifq

    output:
    path &quot;all_stats.txt&quot;

    &quot;&quot;&quot;
    seqkit stat ${orifq} -T &gt; all_stats.txt
    &quot;&quot;&quot;
}

process NanoLyse {
    debug true
    publishDir &quot;${params.outdir}/outfq&quot;, mode: &#39;copy&#39;

    input:
    path orifq

    output:
    path &quot;${orifq.getSimpleName()}_reads_without_host.fastq.gz&quot;

    &quot;&quot;&quot;
    gunzip -c ${orifq} | NanoLyse --reference ${params.hostref} | gzip &gt; ${orifq.getSimpleName()}_reads_without_host.fastq.gz
    &quot;&quot;&quot;
}

process CENTRIFUGE {
    debug true
    publishDir &quot;${params.outdir}/centrifuge&quot;, mode: &#39;copy&#39;

    input:
    path filterfq
    path params.db

    output:
    path &quot;${filterfq.getSimpleName()}_cenout.csv&quot;
    path &quot;${filterfq.getSimpleName()}_report.tsv&quot;
    path &quot;${filterfq.getSimpleName()}_cenout.kraken&quot;

    &quot;&quot;&quot;
    centrifuge -x ${params.db} -U ${filterfq} -S ${filterfq.getSimpleName()}_cenout.csv --report-file ${filterfq.getSimpleName()}_report.tsv
    centrifuge-kreport -x ${params.db} ${filterfq.getSimpleName()}_cenout.csv &gt; ${filterfq.getSimpleName()}_cenout.kraken
    &quot;&quot;&quot;
}

process NO_HOST_STATS {
    debug true
    publishDir &quot;${params.outdir}/stats&quot;, mode: &#39;copy&#39;

    input:
    path filterfq

    output:
    path &quot;no_host_stats.txt&quot;

    &quot;&quot;&quot;
    seqkit stat ${filterfq} -T &gt; no_host_stats.txt
    &quot;&quot;&quot;
}

process MAPPING {
    debug true
    publishDir &quot;${params.outdir}/bam&quot;, mode: &#39;copy&#39;

    input:
    path filterfq

    output:
    path (&quot;${filterfq.getSimpleName()}_no_host.bam&quot;), emit: bam
    path (&quot;${filterfq.getSimpleName()}_no_host.bam.bai&quot;)

    &quot;&quot;&quot;
    minimap2 -ax map-ont ${params.targetref} ${filterfq} | samtools view -b | samtools sort &gt; ${filterfq.getSimpleName()}_no_host.bam
    samtools index ${filterfq.getSimpleName()}_no_host.bam
    &quot;&quot;&quot;
}

process EXTRACT_MAPPED {
    debug true
    publishDir &quot;${params.outdir}/outfq&quot;, mode: &#39;copy&#39;

    input:
    path bam

    output:
    path &quot;${bam.getBaseName()}_mapped.fastq.gz&quot;

    &quot;&quot;&quot;
    samtools fastq -F 0X904 ${bam} | gzip &gt; ${bam.getBaseName()}_mapped.fastq.gz
    &quot;&quot;&quot;
}

process MAPPED_STATS {
    debug true
    publishDir &quot;${params.outdir}/stats&quot;, mode: &#39;copy&#39;

    input:
    path mappedfq

    output:
    path &quot;mapped_stats.txt&quot;

    &quot;&quot;&quot;
    seqkit stat ${mappedfq} -T &gt; mapped_stats.txt
    &quot;&quot;&quot;
}


process STATS_SUMMARY {
    debug true
    publishDir &quot;${params.outdir}/stats&quot;, mode: &#39;copy&#39;

    input:
    path &quot;${projectDir}/Analysis/stats/all_stats.txt&quot;
    path &quot;${projectDir}/Analysis/stats/no_host_stats.txt&quot;
    path &quot;${projectDir}/Analysis/stats/mapped_stats.txt&quot;

    output:
    path &quot;Summary.csv&quot;

    &quot;&quot;&quot;
    python ${projectDir}/summ.py     ${projectDir}/Analysis/stats/all_stats.txt ${projectDir}/Analysis/stats/no_host_stats.txt {projectDir}/Analysis/stats/mapped_stats.txt
    &quot;&quot;&quot;
}

workflow {
    Channel
    .fromPath(params.infq, checkIfExists: true)
    .map { it -&gt; [it.name.split(&quot;_&quot;)[2], it] }
    .groupTuple()
    .set{infq_ch}

    CAT(infq_ch) \
    |collect \
    |ALL_STATS

    NanoLyse(CAT.out) \
    |collect \
    |NO_HOST_STATS

    CENTRIFUGE(NanoLyse.out, params.db)

    MAPPING(NanoLyse.out)

    EXTRACT_MAPPED(MAPPING.out.bam) \
    |collect \
    |MAPPED_STATS

    STATS_SUMMARY(ALL_STATS.out, NO_HOST_STATS.out, MAPPED_STATS.out)
}

答案1

得分: 1

cpus 指令只是允许您设置资源请求中的 CPU 数量。然后由作业调度程序来满足此请求。请注意，这只是一个“请求”，并非所有资源管理器都会强制执行 CPU 或内存限制。这意味着最终由您负责确保您的作业不会使用比您请求的资源多（或远少）。例如，使用比您请求的资源更多可能会潜在地超额分配您的作业所在的节点。要使用使用 cpus 指令定义的 CPU 数量，我们可以在我们的脚本块中使用 task.cpus 隐式变量，例如：

params.seqs = '/path/to/seqs/*.fa';

process blastp {

    input:
    path input_sequence

    """
    blastp \\
        -num_threads ${task.cpus} \\
        -query input_sequence
    """
}

workflow {

    seqs = Channel.fromPath( params.seqs )

    blastp( seqs )
}

并且可以使用一个或多个 process selectors 来覆盖任何默认设置在您的 nextflow.config 中：

process {

    executor = 'local';

    cpus = 2
    memory = 128.GB
    time = 6.h

    withName: blastp {
        cpus = 8
        memory = 12.GB
        time = 1.h
    }
}

如果我在我的本地工作站上运行上述工作流，该工作站配备了一个八核的 Xeon 处理器，Nextflow 将一次只运行一个作业。如果您尝试请求比系统已安装的 CPU 或内存更多的资源，我相信会引发异常。

英文:

The cpus directive just lets you set the number of CPUs in your resource request. It is then up to your job scheduler to fulfill this request. Note that this is just a request and not all resource managers will impose CPU or memory limits. This means that it is ultimately your responsibility to ensure that your jobs do not use more (or much less) than what you ask for. Using more than what you ask for, for example, can potentially over-subscribe the node(s) that your jobs land on. To use the number of CPUs defined using the cpus directive, we can use the task.cpus implicit variable in our script block, for example:

params.seqs = &#39;/path/to/seqs/*.fa&#39;

process blastp {

    input:
    path input_sequence

    &quot;&quot;&quot;
    blastp \\
        -num_threads ${task.cpus} \\
        -query input_sequence

    &quot;&quot;&quot;
}

workflow {

    seqs = Channel.fromPath( params.seqs )

    blastp( seqs )
}

And override any defaults using one or more process selectors in your nextflow.config:

process {

    executor = &#39;local&#39;

    cpus = 2
    memory = 128.GB
    time = 6.h

    withName: blastp {
        cpus = 8
        memory = 12.GB
        time = 1.h
    }
}

If I was to run the above workflow on my local workstation which has an octa-core Xeon processor, Nextflow will run only one job at a time. I believe an exception is raised if you try to ask for more CPU or memory than what your system has installed.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Nextflow的CPU设置配置文件

问题

答案1

获取目录通道 nexflow 的文件列表。

why I got Cannot invoke method view() on null object

Nextflow Units file specified is not found. Please provide a valid file.

nextflow: 创建索引 – 获取路径

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论