2023年5月10日 19:16:30go评论114阅读模式

英文:

How to process multiple input (yaml/json) from S3 using Nextflow dsl2

问题

I see you have a complex Nextflow workflow, and you want to make several modifications. However, it's a lengthy code with multiple sections and dependencies. It's challenging to address all these modifications in a single response. It's best to tackle one specific issue or modification at a time.

If you have a specific question or need help with a particular aspect of your code or workflow, please provide more specific details, and I'll do my best to assist you.

英文:

I need to process over 1k samples with the nextflow (dsl2) pipeline in aws batch. current version of the workflow process single input per run. I'm looking workflow syntax (map tuple to iterate) to process multiple inputs to run in parralel. The inputs should be in json or yaml format, path to the input files are unique to each sample.

To preserve the input file path "s3://..." I used .fromPath in channel.

Following is my single sample input config input.yaml (-parms-file)

id: HLA1001
bam: s3://HLA/data/HLA1001.bam
vcf: s3://HLA/data/HLA1001.vcf.gz

Workflow to run single sample input

process samtools_stats {
    tag &quot;${id}&quot;
    publishDir &quot;${params.publishdir}/${id}/samtools&quot;, mode: &quot;copy&quot;
    input:
    path bam
    
    output:
    path &quot;${id}.stats&quot;
    script:
    &quot;&quot;&quot;
      samtools stats ${bam} &gt; ${id}.stats
    &quot;&quot;&quot;
}
process mosdepth_bam {
    tag &quot;${id}&quot;
    publishDir &quot;${params.publishdir}/${id}/mosdepth&quot;, mode: &quot;copy&quot;
    input:
    path bam
    path bam_idx
    output:
    path &quot;${id}.regions.bed.gz&quot;
    script:
    &quot;&quot;&quot;
    mosdepth --no-per-base --by 1000 --mapq 20 --threads 4 ${id} ${bam}
    &quot;&quot;&quot;
}
process mosdepth_cram {
    tag &quot;${id}&quot;
    publishDir &quot;${params.publishdir}/${id}/mosdepth&quot;, mode: &quot;copy&quot;
    input:
    path bam
    path bam_idx
    path reference
    path reference_idx
    output:
    path &quot;${id}.regions.bed.gz&quot;
    script:
    &quot;&quot;&quot;
    mosdepth --no-per-base --by 1000 --mapq 20 --threads 4 --fasta ${reference} ${id} ${bam}
    &quot;&quot;&quot;
}
process bcftools_stats {
    tag &quot;${id}&quot;
    publishDir &quot;${params.publishdir}/${id}/bcftools&quot;, mode: &quot;copy&quot;
    input:
    path vcf
    path vcf_idx
    output:
    path &quot;*&quot;
    script:
    &quot;&quot;&quot;
    bcftools stats -f PASS ${vcf} &gt; ${id}.pass.stats
    &quot;&quot;&quot;
}
process multiqc {
    tag &quot;${id}&quot;
    publishDir &quot;${params.publishdir}/${id}/multiqc&quot;, mode: &quot;copy&quot;
	
    input:
    path &quot;*&quot;
    output:
    path  &quot;multiqc_data/*&quot;, emit: multiqc_ch
    script:
    &quot;&quot;&quot;
    multiqc . --data-format json --enable-npm-plugin
    &quot;&quot;&quot;
}
process compile_metrics {
    tag &quot;${id}&quot;
    publishDir &quot;${params.publishdir}/${id}&quot;, mode: &quot;copy&quot;
    input:
    path multiqc
    output:
    path &quot;${params.id}.metrics.json&quot;, emit: compile_metrics_out
    script:
    &quot;&quot;&quot;
    # parse and calculate all the metrics in the multiqc output to compile
    compile_metrics.py \
        --multiqc_json multiqc_data.json \
        --output_json ${params.id}.metrics.json \
        --biosample_id ${params.id}
    &quot;&quot;&quot;
}
/*
----------------------------------------------------------------------
WORKFLOW
---------------------------------------------------------------------
*/
id = params.id
aln_file = file ( params.bam )
aln_file_type = aln_file.getExtension()
vcf_file = ( params.vcf )
vcf_idx = channel.fromPath(params.vcf + &quot;.tbi&quot;, checkIfExists: true)
if (aln_file_type == &quot;bam&quot;) {
    cbam = channel.fromPath(params.bam, checkIfExists: true)
    cbam_idx = channel.fromPath(params.bam + &quot;.bai&quot;, checkIfExists: true)
}
else if (aln_file_type == &quot;cram&quot;) {
    cbam = channel.fromPath(params.bam, checkIfExists: true)
    cbam_idx = channel.fromPath(params.bam + &quot;.crai&quot;, checkIfExists: true)
}
reference = channel.fromPath(params.reference, checkIfExists: true)
reference_idx = channel.fromPath(params.reference + &quot;.fai&quot;, checkIfExists: true)
// main
workflow {
    if (aln_file_type == &quot;bam&quot;) {
        samtools_stats( bam )
        mosdepth_bam( bam, bam_idx )
        bcftools_stats ( vcf, vcf_idx )
        multiqc( samtools_stats.out.mix( mosdepth_bam.out ).collect() )
        compile_metrics(multiqc.out)
    } else if (aln_file_type == &quot;cram&quot;) {
        samtools_stats( bam )
        mosdepth_cram( bam, bam_idx, reference, reference_idx )
        bcftools_stats ( vcf, vcf_idx )
        multiqc( samtools_stats.out.mix( mosdepth_cram.out ).collect() )
        compile_metrics(multiqc.out)
    }
}

I want to modify the workflow to run for the following multi sample input in parellel

samples:
-
 id: HLA1001
 bam: s3://HLA/data/udf/HLA1001.bam
 vcf: s3://HLA/data/udf/HLA1001.vcf.gz
-
 id: NHLA1002
 bam: s3://HLA/data/sdd/HLA1002.bam
 vcf: s3://HLA/data/sdd/HLA1002.vcf.gz
-
 id: NHLA1003
 bam: s3://HLA/data/klm/HLA1003.bam
 vcf: s3://HLA/data/klm/HLA1003.vcf.gz
-
 id: NHLA2000
 bam: s3://HLA/data/rcb/HLA2000.bam
 vcf: s3://HLA/data/rcb/HLA2000.vcf.gz

The expected final output folder structure for the multiple samples..

s3://mybucket/results/HLA1001/
samtools/
mosdepth/
bcftools/
multiqc/
metrics/HLA1001.metrics.json
s3://mybucket/results/HLA1002/
samtools/
mosdepth/
bcftools/
multiqc/
metrics/HLA1002.metrics.json

The input of bam/cram, vcf and input of multiqc and compile_metrics all must fetch the same sample in every single process.

Appreciate your help! Thanks

Follwing the method answered by @steve..

Contents of main.nf: update

include { compile_metrics } from &#39;./modules/compile_metrics&#39;
    Channel
        .fromList( params.samples )
        .map { it.biosample_id }
        .set { sample_ids }
    compile_metrics ( sample_ids, multiqc.out.json_data )
}

Contents of modules/compile_metrics/main.nf:

process compile_metrics {
    tag { sample_ids }
    input:
    val(sample_ids)
    path &quot;multiqc_data.json&quot;
    output:
    tuple val(sample_ids), path(&quot;${sample_ids}.metrics.json&quot;), emit: compile_metrics_out
    script:
    &quot;&quot;&quot;
    compile_metrics.py \
        --multiqc_json multiqc_data.json \
        --output_json &quot;${sample_ids}.metrics.json&quot; \\
        --biosample_id &quot;${sample_ids}&quot; \\
    &quot;&quot;&quot;
}

Update main.nf:

include { mosdepth_datamash } from &#39;./modules/mosdepth_datamash&#39;
autosomes_non_gap_regions = file( params.autosomes_non_gap_regions )
mosdepth_datamash( autosomes_non_gap_regions, mosdepth_bam.out.regions.mix( mosdepth_cram.out.regions ).collect() )

Update mosdepth_datamash:

process mosdepth_datamash {
    tag { sample }
    input:
    path autosomes_non_gap_regions
    tuple val(sample), path(regions)
    output:
    tuple val(sample), path(&quot;${sample}.mosdepth.csv&quot;), emit: coverage
    script:
    &quot;&quot;&quot;
    zcat &quot;${sample}.regions.bed.gz&quot; | bedtools intersect -a stdin -b ${autosomes_non_gap_regions} | gzip -9c &gt; &quot;${sample}.regions.autosomes_non_gap_n_bases.bed.gz&quot;
    .....
}

Update main.nf: fix - use queue channel instead of collect

mosdepth_datamash( autosomes_non_gap_regions, mosdepth_bam.out.regions.mix( mosdepth_cram.out.regions ) )

Process verifybamid works with file instead of channel.fromPath

vbi2_ud = file( params.vbi2_ud )
vbi2_bed = file( params.vbi2_bed )
vbi2_mean = file( params.vbi2_mean )

How to modify the channel to handle backward (previous version of workflow single sample input format) compatible of single sample input format which lacks sample key

id: HLA1001
bam: s3://HLA/data/HLA1001.bam
vcf: s3://HLA/data/HLA1001.vcf.gz

Content of processing the input mani.nf:

Channel
        .fromList( params.samples )
        .branch { rec -&gt;
            def aln_file = file( rec.bam )
            bam: aln_file.extension == &#39;bam&#39;
                def bam_idx = file( &quot;${rec.bam}.bai&quot; )
                return tuple( rec.id, aln_file, bam_idx )
            cram: aln_file.extension == &#39;cram&#39;
                def cram_idx = file( &quot;${rec.bam}.crai&quot; )
                return tuple( rec.id, aln_file, cram_idx )
        }
        .set { aln_inputs }
    Channel
        .fromList( params.samples )
        .map { rec -&gt;
            def vcf_file = file( rec.vcf )
            def vcf_idx = file( &quot;${rec.vcf}.tbi&quot; )
            tuple( rec.id, vcf_file, vcf_idx )
        }
        .set { vcf_inputs }
    Channel
        .fromList( params.samples )
        .map { it.biosample_id }
        .set { sample_ids }

Updated main.nf works well

INPUT format A or B:
A)
biosample_id: NA12878-chr14
bam: s3://sample-qc/data/NA12878-chr14.bam
B)
samples:
-
 biosample_id: NA12878-chr14
 bam: s3://sample-qc/data/NA12878-chr14.bam
---------------------------------------------------------------
workflow {
....
....
params.samples = null
Channel
        .fromList( params.samples )
        .ifEmpty { [&#39;biosample_id&#39;: params.biosample_id, &#39;bam&#39;: params.bam] }
        .branch { rec -&gt;
            def aln_file = rec.bam ? file( rec.bam ) : null
            bam: rec.biosample_id &amp;&amp; aln_file?.extension == &#39;bam&#39;
                def bam_idx = file( &quot;${rec.bam}.bai&quot; )
                return tuple( rec.biosample_id, aln_file, bam_idx )
            cram: rec.biosample_id &amp;&amp; aln_file?.extension == &#39;cram&#39;
                def cram_idx = file( &quot;${rec.bam}.crai&quot; )
                return tuple( rec.biosample_id, aln_file, cram_idx )
        }
        .set { aln_inputs }
Channel
    .fromList( params.samples )
    .ifEmpty { [&#39;biosample_id&#39;: params.biosample_id] }
    .map { it.biosample_id }
    .set { sample_ids }
compile_metrics ( sample_ids, multiqc.out.json_data )
...
...
}

Trying other way of not duplicate the code (the above code block .ifEmpty) in each process to parse the params.sample. eg two process here required to use params.sample

INPUT format A or B:
A)
biosample_id: NA12878-chr14
bam: s3://sample-qc/data/NA12878-chr14.bam
B)
samples:
-
 biosample_id: NA12878-chr14
 bam: s3://sample-qc/data/NA12878-chr14.bam
-------------------------------------------------------------
params.samples = &#39;&#39;
//    params.samples = null
def get_samples_list() {
    if (params.samples) {
        return  params.samples
    }
    else {
    return [&#39;biosample_id&#39;: params.biosample_id, &#39;bam&#39;: params.bam]
    }
}
workflow {
//    params.samples = &#39;&#39;
    samples = get_samples_list()
    ...
    ...
    Channel
        .fromList( samples )
        .branch { rec -&gt;
            def aln_file = rec.bam ? file( rec.bam ) : null
            bam: rec.biosample_id &amp;&amp; aln_file?.extension == &#39;bam&#39;
                def bam_idx = file( &quot;${rec.bam}.bai&quot; )
                return tuple( rec.biosample_id, aln_file, bam_idx )
            cram: rec.biosample_id &amp;&amp; aln_file?.extension == &#39;cram&#39;
                def cram_idx = file( &quot;${rec.bam}.crai&quot; )
                return tuple( rec.biosample_id, aln_file, cram_idx )
        }
        .set { aln_inputs }
    samtools_stats_bam( aln_inputs.bam, [] )
    samtools_stats_cram( aln_inputs.cram, ref_fasta )
    Channel
        .fromList( params.samples )
        .map { it.biosample_id }
        .set { sample_ids }
    compile_metrics ( sample_ids, multiqc.out.json_data )
}

ERROR:

Workflow execution stopped with the following message:
Exit status   : null
Error message : Cannot invoke method branch() on null object
Error report  : Cannot invoke method branch() on null object
ERROR ~ Cannot invoke method branch() on null object

Calling the samples from channel to reuse works well and much better approach.

Channel
        .fromList( params.samples )
        .ifEmpty { [&#39;biosample_id&#39;: params.biosample_id, &#39;bam&#39;: params.bam] }
        .set { samples }
    Channel
        samples.branch { rec -&gt;
        ....
        
    Channel
        samples.map { it.biosample_id }

How to read the input.yml as an argument --input_listusing .fromList and read as list compatible with code in the 'sample channel with minimal change?
--input_list s3://mybucket/input.yaml instead of directly reading -params-file input.yaml as a list in the channel.
eg.

nextflow run main.nf \
    -ansi-log false \
//    -params-file input.yaml \
    --input_list s3://mybucket/input.yaml
    -work &#39;s3://mybucket/work&#39; \
    --publish_dir &#39;s3://mybucket/results&#39; \
    --ref_fasta &#39;s3://mybucket/ref.fa&#39;

Current code
....

Channel
//  .fromPath( params.input_list )
    .fromList( params.samples )
    .ifEmpty { [&#39;biosample_id&#39;: params.biosample_id, &#39;bam&#39;: params.bam] }
    .set { samples }

input.yaml

samples:
-
 id: HLA1001
 bam: s3://HLA/data/udf/HLA1001.bam
-
 id: NHLA1002
 bam: s3://HLA/data/sdd/HLA1002.bam

答案1

得分: 2

以下是您提供的内容的翻译：

使用通道，可以处理任意数量的样本，包括一个样本。以下是一种使用模块来处理BAM和CRAM输入的方法。请注意，下面的每个进程都期望一个输入tuple，其中第一个元素是样本名称或键。为了能够在下游合并通道时提供很大的帮助，我们还应确保输出具有相同的样本名称或键。以下内容未经AWS Batch测试，但至少可以帮助您入门：

main.nf的内容：

include { mosdepth as mosdepth_bam } from './modules/mosdepth'
include { mosdepth as mosdepth_cram } from './modules/mosdepth'
include { multiqc } from './modules/multiqc'
include { samtools_stats as samtools_stats_bam } from './modules/samtools'
include { samtools_stats as samtools_stats_cram } from './modules/samtools'
include { mosdepth_datamash } from './modules/mosdepth_datamash'```
工作流程：
```workflow {
    ref_fasta = file( params.ref_fasta )
    autosomes_non_gap_regions = file( params.autosomes_non_gap_regions )
    Channel
        .fromList( params.samples )
        .ifEmpty { ['id': params.id, 'bam': params.bam] }
        .branch { rec ->
            def aln_file = rec.bam ? file( rec.bam ) : null
            bam: rec.id && aln_file?.extension == 'bam'
                def bam_idx = file( "${rec.bam}.bai" )
                return tuple( rec.id, aln_file, bam_idx )
            cram: rec.id && aln_file?.extension == 'cram'
                def cram_idx = file( "${rec.bam}.crai" )
                return tuple( rec.id, aln_file, cram_idx )
        }
        .set { aln_inputs }
    Channel
        .fromList( params.samples )
        .ifEmpty { ['id': params.id, 'vcf': params.vcf] }
        .branch { rec ->
            def vcf_file = rec.vcf ? file( rec.vcf ) : null
            output: rec.id && vcf_file
                def vcf_idx = file( "${rec.vcf}.tbi" )
                return tuple( rec.id, vcf_file, vcf_idx )
        }
        .set { vcf_inputs }
    mosdepth_bam( aln_inputs.bam, [] )
    mosdepth_cram( aln_inputs.cram, ref_fasta )
    samtools_stats_bam( aln_inputs.bam, [] )
    samtools_stats_cram( aln_inputs.cram, ref_fasta )
    bcftools_stats( vcf_inputs )
    Channel
        .empty()
        .mix( mosdepth_bam.out.regions )
        .mix( mosdepth_cram.out.regions )
        .set { mosdepth_regions }
    mosdepth_datamash( mosdepth_regions, autosomes_non_gap_regions )
    Channel
        .empty()
        .mix( mosdepth_bam.out.dists )
        .mix( mosdepth_bam.out.summary )
        .mix( mosdepth_cram.out.dists )
        .mix( mosdepth_cram.out.summary )
        .mix( samtools_stats_bam.out )
        .mix( samtools_stats_cram.out )
        .mix( bcftools_stats.out )
        .mix( mosdepth_datamash.out )
        .map { sample, files -> files }
        .collect()
        .set { log_files }
    multiqc( log_files )
}```
`modules/samtools/main.nf`的内容：
```process samtools_stats {
    tag { sample }
    input:
    tuple val(sample), path(bam), path(bai)
    path ref_fasta
    output:
    tuple val(sample), path("${sample}.stats")
    script:
    def reference = ref_fasta ? /--reference "${ref_fasta}"/ : ''
    """
    samtools stats \\
        ${reference} \\
        "${bam}" \\
        > "${sample}.stats"
    """
}```
`modules/mosdepth/main.nf`的内容：
```process mosdepth {
    tag { sample }
    input:
    tuple val(sample), path(bam), path(bai)
    path ref_fasta
    output:
    tuple val(sample), path("*.regions.bed.gz"), emit: regions
    tuple val(sample), path("*.dist.txt"), emit: dists
    tuple val(sample), path("*.summary.txt"), emit: summary
    script:
    def fasta = ref_fasta ? /--fasta "${ref_fasta}"/ : ''
    """
    mosdepth \\
        --no-per-base \\
        --by 1000 \\
        --mapq 20 \\
        --threads ${task.cpus} \\
        ${fasta} \\
        "${sample}" \\
        "${bam}"
    """
}```
`modules/bcftools/main.nf`的内容：
```process bcftools_stats {
    tag { sample }
    input:
    tuple val(sample), path(vcf), path(tbi)
    output:
    tuple val(sample), path("${sample}.pass.stats")
    """
    bcftools stats \\
        -f PASS \\
        "${vcf}" \\
        > "${sample}.pass.stats"
    """
}```
`modules/multiqc/main.nf`的内容：
```process multiqc {
    input:
    path 'data/*'
    output:
    path "multiqc_report.html", emit: report
    path "multiqc_data", emit: data
    """
    multiqc \\
        --data-format json \\
        .
    """
}```
`modules/compile_metrics/main.nf`的内容：
```process compile_metrics {
    tag { sample_id }
    input:
    val sample_id
    path multiqc_json
    output:
    tuple val(sample_id), path("${sample_id}.metrics.json")
    """
    compile_metrics.py \\
        --multiqc_json "${multiqc_json}" \\
        --output_json "${sample_id}.metrics.json" \\
        --biosample_id "${sample_id}"
    """
}```
`nextflow.config`的内容：
```plugins {
    id 'nf-amazon'
}
params {
    ref_fasta = null
    autosomes_non_gap_regions = null
    samples = null
    id = null
    bam = null
    vcf = null
    publish_dir = './results'
    publish_mode = 'copy'
}
process {
    executor = 'awsbatch'
    queue = 'test-queue'
    errorStrategy = 'retry'
    maxRetries = 3
    withName: 'samtools_stats' {
        publishDir = [
            path: "${params.publish_dir}/samtools",
           
<details>
<summary>英文:</summary>
With [channels](https://www.nextflow.io/docs/latest/channel.html#channels) it is possible to process any number of samples, including just one. Here&#39;s one way that use [modules](https://www.nextflow.io/docs/latest/dsl2.html#modules) to handle both BAM and CRAM inputs. Note that each process below expects an input [`tuple`](https://www.nextflow.io/docs/latest/process.html#input-type-tuple) where the first element is a sample name or key. To greatly assist with being able to merge channels downstream, we should also ensure we output tuples with the same sample name or key. The following is untested on AWS Batch, but it should at least get you started:
Contents of `main.nf`:

include { bcftools_stats } from './modules/bcftools'
include { mosdepth as mosdepth_bam } from './modules/mosdepth'
include { mosdepth as mosdepth_cram } from './modules/mosdepth'
include { multiqc } from './modules/multiqc'
include { samtools_stats as samtools_stats_bam } from './modules/samtools'
include { samtools_stats as samtools_stats_cram } from './modules/samtools'
include { mosdepth_datamash } from './modules/mosdepth_datamash'

workflow {

ref_fasta = file( params.ref_fasta )
autosomes_non_gap_regions = file( params.autosomes_non_gap_regions )
Channel
    .fromList( params.samples )
    .ifEmpty { [&#39;id&#39;: params.id, &#39;bam&#39;: params.bam] }
    .branch { rec -&gt;
        def aln_file = rec.bam ? file( rec.bam ) : null
        bam: rec.id &amp;&amp; aln_file?.extension == &#39;bam&#39;
            def bam_idx = file( &quot;${rec.bam}.bai&quot; )
            return tuple( rec.id, aln_file, bam_idx )
        cram: rec.id &amp;&amp; aln_file?.extension == &#39;cram&#39;
            def cram_idx = file( &quot;${rec.bam}.crai&quot; )
            return tuple( rec.id, aln_file, cram_idx )
    }
    .set { aln_inputs }
Channel
    .fromList( params.samples )
    .ifEmpty { [&#39;id&#39;: params.id, &#39;vcf&#39;: params.vcf] }
    .branch { rec -&gt;
        def vcf_file = rec.vcf ? file( rec.vcf ) : null
        output: rec.id &amp;&amp; vcf_file
            def vcf_idx = file( &quot;${rec.vcf}.tbi&quot; )
            return tuple( rec.id, vcf_file, vcf_idx )
    }
    .set { vcf_inputs }
mosdepth_bam( aln_inputs.bam, [] )
mosdepth_cram( aln_inputs.cram, ref_fasta )
samtools_stats_bam( aln_inputs.bam, [] )
samtools_stats_cram( aln_inputs.cram, ref_fasta )
bcftools_stats( vcf_inputs )
Channel
    .empty()
    .mix( mosdepth_bam.out.regions )
    .mix( mosdepth_cram.out.regions )
    .set { mosdepth_regions }
mosdepth_datamash( mosdepth_regions, autosomes_non_gap_regions )
Channel
    .empty()
    .mix( mosdepth_bam.out.dists )
    .mix( mosdepth_bam.out.summary )
    .mix( mosdepth_cram.out.dists )
    .mix( mosdepth_cram.out.summary )
    .mix( samtools_stats_bam.out )
    .mix( samtools_stats_cram.out )
    .mix( bcftools_stats.out )
    .mix( mosdepth_datamash.out )
    .map { sample, files -&gt; files }
    .collect()
    .set { log_files }
multiqc( log_files )

}


Contents of `modules/samtools/main.nf`:

process samtools_stats {

tag { sample }
input:
tuple val(sample), path(bam), path(bai)
path ref_fasta
output:
tuple val(sample), path(&quot;${sample}.stats&quot;)
script:
def reference = ref_fasta ? /--reference &quot;${ref_fasta}&quot;/ : &#39;&#39;
&quot;&quot;&quot;
samtools stats \\
    ${reference} \\
    &quot;${bam}&quot; \\
    &gt; &quot;${sample}.stats&quot;
&quot;&quot;&quot;

}


Contents of `modules/mosdepth/main.nf`:

process mosdepth {

tag { sample }
input:
tuple val(sample), path(bam), path(bai)
path ref_fasta
output:
tuple val(sample), path(&quot;*.regions.bed.gz&quot;), emit: regions
tuple val(sample), path(&quot;*.dist.txt&quot;), emit: dists
tuple val(sample), path(&quot;*.summary.txt&quot;), emit: summary
script:
def fasta = ref_fasta ? /--fasta &quot;${ref_fasta}&quot;/ : &#39;&#39;
&quot;&quot;&quot;
mosdepth \\
    --no-per-base \\
    --by 1000 \\
    --mapq 20 \\
    --threads ${task.cpus} \\
    ${fasta} \\
    &quot;${sample}&quot; \\
    &quot;${bam}&quot;
&quot;&quot;&quot;

}


Contents of `modules/bcftools/main.nf`:

process bcftools_stats {

tag { sample }
input:
tuple val(sample), path(vcf), path(tbi)
output:
tuple val(sample), path(&quot;${sample}.pass.stats&quot;)
&quot;&quot;&quot;
bcftools stats \\
    -f PASS \\
    &quot;${vcf}&quot; \\
    &gt; &quot;${sample}.pass.stats&quot;
&quot;&quot;&quot;

}


Contents of `modules/multiqc/main.nf`:

process multiqc {

input:
path &#39;data/*&#39;
output:
path &quot;multiqc_report.html&quot;, emit: report
path &quot;multiqc_data&quot;, emit: data
&quot;&quot;&quot;
multiqc \\
    --data-format json \\
    .
&quot;&quot;&quot;

}


Contents of `modules/compile_metrics/main.nf`:

process compile_metrics {

tag { sample_id }
input:
val sample_id
path multiqc_json
output:
tuple val(sample_id), path(&quot;${sample_id}.metrics.json&quot;)
&quot;&quot;&quot;
compile_metrics.py \\
    --multiqc_json &quot;${multiqc_json}&quot; \\
    --output_json &quot;${sample_id}.metrics.json&quot; \\
    --biosample_id &quot;${sample_id}&quot;
&quot;&quot;&quot;

}


Contents of `./modules/mosdepth_datamash/main.nf`:

process mosdepth_datamash {

tag { sample_id }
input:
tuple val(sample_id), path(regions_bed)
path autosomes_non_gap_regions
output:
tuple val(sample_id), path(&quot;${sample_id}.mosdepth.csv&quot;)
&quot;&quot;&quot;
zcat -f &quot;${regions_bed}&quot; |
    bedtools intersect -a stdin -b &quot;${autosomes_non_gap_regions}&quot; |
    gzip -9 &gt; &quot;${sample_id}.regions.autosomes_non_gap_n_bases.bed.gz&quot;
# do something
touch &quot;${sample_id}.mosdepth.csv&quot;
&quot;&quot;&quot;

}


Contents of `nextflow.config`:

plugins {

id &#39;nf-amazon&#39;

}

params {

ref_fasta = null
autosomes_non_gap_regions = null
samples = null
id = null
bam = null
vcf = null
publish_dir = &#39;./results&#39;
publish_mode = &#39;copy&#39;

}

process {

executor = &#39;awsbatch&#39;
queue = &#39;test-queue&#39;
errorStrategy = &#39;retry&#39;
maxRetries = 3
withName: &#39;samtools_stats&#39; {
    publishDir = [
        path: &quot;${params.publish_dir}/samtools&quot;,
        mode: params.publish_mode,
    ]
}
withName: &#39;bcftools_stats&#39; {
    publishDir = [
        path: &quot;${params.publish_dir}/bcftools&quot;,
        mode: params.publish_mode,
    ]
}
withName: &#39;mosdepth&#39; {
    cpus = 4
    publishDir = [
        path: &quot;${params.publish_dir}/mosdepth&quot;,
        mode: params.publish_mode,
    ]
}
withName: &#39;multiqc&#39; {
    publishDir = [
        path: &quot;${params.publish_dir}/multiqc&quot;,
        mode: params.publish_mode,
    ]
}

}

aws {

region = &#39;us-east-1&#39;
batch {
    cliPath = &#39;/home/ec2-user/miniconda/bin/aws&#39;
}

}


And run using something like:

$ nextflow run main.nf
-ansi-log false
-params-file input.yaml
-work 's3://mybucket/work'
--publish_dir 's3://mybucket/results'
--ref_fasta 's3://mybucket/ref.fa'


</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Nextflow dsl2处理来自S3的多个输入（yaml/json）。

问题

答案1

为什么我无法从Python中查询Oracle表，但可以从SQL开发人员查询？

在 macOS 终端中使用 Shell 查找文件中的单词。

根据另一列的更改逐行填充NaN值。

Langchain使用Huggingface进行嵌入，无法通过访问令牌。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。