2023年8月4日 06:27:06go评论198阅读模式

英文:

nextflow - spltiCSV - each element - error : If you need to reuse the same component

问题

对于一个Nextflow管道，我想要读取一个包含五列的CSV文件：

sample1,path/normal_R1.fastq,path/normal_R2.fastq,path/tumor_R1.fastq,path/tumor_R2.fastq
sample2,path/normal_R1.fastq,path/normal_R2.fastq,path/tumor_R1.fastq,path/tumor_R2.fastq

我读取了文件，创建了一个LinkedHashMap。对于每个元素，我想运行一些处理过程。在没有CSV迭代的情况下，这些处理过程一直正常工作，因为它们是通过肿瘤文件的通道和正常文件的通道提供的。

当我编辑带有CSV的代码时，我收到以下错误消息：

Process 'FASTP' has been already used -- If you need to reuse the same
component, include it with a different name or include it in a
different workflow context

以下是代码：

include { FASTP} from './fastp_process.nf'
include {bwa_index} from './index_process.nf'
include { align_bwa_mem} from './bwamem_process_already_index.nf'
include { gatk_markduplicates} from './gatk_markduplicates_process.nf'
include {setupnmdtags} from './setupnmdtags_process.nf'
include { recalibrate_bam } from './recalibratebam_process.nf'
include { applybqsr } from './applybqsr_process.nf'
include { mutect2 } from './mutect2_process.nf'
include { lancet } from './lancet_process.nf'
include { manta } from './manta_process.nf'
include { strelka } from './strelka_process.nf'
include { gatk_merge_vcfs } from  './gatk_merge_vcfs.nf'
workflow {
def csvFile = file("input_nextflow_files.csv")
def csvLines = csvFile.text.readLines()
def sampleMap = csvLines.collectEntries { line ->
    def lineCols = line.split(',')
    if (lineCols.size() >= 5) {
        def sampleName = lineCols[0]
        def normalR1 = file(lineCols[1])
        def normalR2 = file(lineCols[2])
        def tumorR1 = file(lineCols[3])
        def tumorR2 = file(lineCols[4])
        [(sampleName): [tuple(normalR1, normalR2), tuple(tumorR1, tumorR2)]]
    } else {
        return [:]
    }
}
sampleMap.each { sampleName, pairList ->
    def normalPair = pairList[0]
    def tumorPair = pairList[1]
	FASTP(tumorPair,normalPair,sampleName)
	align_bwa_mem(FASTP.out.reads_tumor,FASTP.out.reads_normal) //already_created index
	}
}

我认为与下面的FASTP处理过程（input）有关：

process FASTP {
	maxForks 3
	debug true
    input:
    path(reads_tumor)  //val outdir  //不适用于 path (outdir) // 传递多个读取 - 用于肿瘤和正常
    path(reads_normal)  //val outdir  //不适用于 path (outdir)
	val (sample_name)
	output:
	tuple val(sample_name), path("${sample_id_tumor}_trim_{1,2}.fq.gz"), emit: reads_tumor
	path("${sample_id_tumor}.fastp.json"), emit: json_tumor
	path("${sample_id_tumor}.fastp.html"), emit: html_tumor
	
	tuple val(sample_id_normal), path("${sample_id_normal}_trim_{1,2}.fq.gz"), emit: reads_normal
	path("${sample_id_normal}.fastp.json"), emit: json_normal
	path("${sample_id_normal}.fastp.html"), emit: html_normal
	
    script:
	def (r1_normal, r2_normal) = reads_normal
	def (r1_tumor, r2_tumor)=reads_tumor
    """
    
    ml fastp
    fastp  --in1 "${r1_normal}" --in2 "${r2_normal}" -q 20  -u 20 -l 40 --detect_adapter_for_pe --out1 "${sample_id_normal}_trim_1.fq.gz" --out2 "${sample_id_normal}_trim_2.fq.gz" --json "${sample_id_normal}.fastp.json" --html "${sample_id_normal}.fastp.html" --thread 12 
    
    fastp  --in1 "${r1_tumor}" --in2 "${r2_tumor}" -q 20  -u 20 -l 40 --detect_adapter_for_pe --out1 "${sample_id_tumor}_trim_1.fq.gz" --out2 "${sample_id_tumor}_trim_2.fq.gz" --json "${sample_id_tumor}.fastp.json" --html "${sample_id_tumor}.fastp.html" --thread 12
    echo "Exiting fastp"
     
       """
}

我不知道如何解决这个错误。我多次检查了是否多次包含了FASTP处理过程，但没有问题。我删除了包含和FASTP调用过程，但它们仍然不起作用。因此，我无法弄清楚出了什么问题。

英文:

For a nextflow pipeline I'd like to read in a CSV file with five columns:

sample1,path/normal_R1.fastq,path/normal_R2.fastq,path/tumor_R1.fastq,path/tumor_R2.fastq
sample2,path/normal_R1.fastq,path/normal_R2.fastq,path/tumor_R1.fastq,path/tumor_R2.fastq

I read in the file, create a linkedHashMap. For each element I'd like to run few processes. The processes has been working fine without CSV iteration, as they were provided by a channel of tumor files and a channel of normal files.

When I edit the code with a CSV, I get error as:

> Process 'FASTP' has been already used -- If you need to reuse the same
> component, include it with a different name or include it in a
> different workflow context

Below is the code:

include { FASTP} from &#39;./fastp_process.nf&#39;
include {bwa_index} from &#39;./index_process.nf&#39;
include { align_bwa_mem} from &#39;./bwamem_process_already_index.nf&#39;
include { gatk_markduplicates} from &#39;./gatk_markduplicates_process.nf&#39;
include {setupnmdtags} from &#39;./setupnmdtags_process.nf&#39;
include { recalibrate_bam } from &#39;./recalibratebam_process.nf&#39;
include { applybqsr } from &#39;./applybqsr_process.nf&#39;
include { mutect2 } from &#39;./mutect2_process.nf&#39;
include { lancet } from &#39;./lancet_process.nf&#39;
include { manta } from &#39;./manta_process.nf&#39;
include { strelka } from &#39;./strelka_process.nf&#39;
include { gatk_merge_vcfs } from  &#39;./gatk_merge_vcfs.nf&#39;
workflow {
def csvFile = file(&quot;input_nextflow_files.csv&quot;)
def csvLines = csvFile.text.readLines()
def sampleMap = csvLines.collectEntries { line -&gt;
def lineCols = line.split(&#39;,&#39;)
if (lineCols.size() &gt;= 5) {
def sampleName = lineCols[0]
def normalR1 = file(lineCols[1])
def normalR2 = file(lineCols[2])
def tumorR1 = file(lineCols[3])
def tumorR2 = file(lineCols[4])
[(sampleName): [tuple(normalR1, normalR2), tuple(tumorR1, tumorR2)]]
} else {
return [:]
}
}
sampleMap.each { sampleName, pairList -&gt;
def normalPair = pairList[0]
def tumorPair = pairList[1]
FASTP(tumorPair,normalPair,sampleName)
align_bwa_mem(FASTP.out.reads_tumor,FASTP.out.reads_normal) //already_created index
}
}

I believe it is something related to the FASTP process below (input):

process FASTP {
maxForks 3
debug true
input:
path(reads_tumor)  //val outdir  //doesn&#39;t work with path (outdir) // we pass multiple reads - for tumor and normal
path(reads_normal)  //val outdir  //doesn&#39;t work with path (outdir)
val (sample_name)
output:
tuple val(sample_name), path(&quot;${sample_id_tumor}_trim_{1,2}.fq.gz&quot;), emit: reads_tumor
path(&quot;${sample_id_tumor}.fastp.json&quot;), emit: json_tumor
path(&quot;${sample_id_tumor}.fastp.html&quot;), emit: html_tumor
tuple val(sample_id_normal), path(&quot;${sample_id_normal}_trim_{1,2}.fq.gz&quot;), emit: reads_normal
path(&quot;${sample_id_normal}.fastp.json&quot;), emit: json_normal
path(&quot;${sample_id_normal}.fastp.html&quot;), emit: html_normal
script:
def (r1_normal, r2_normal) = reads_normal
def (r1_tumor, r2_tumor)=reads_tumor
&quot;&quot;&quot;
ml fastp
fastp  --in1 &quot;${r1_normal}&quot; --in2 &quot;${r2_normal}&quot; -q 20  -u 20 -l 40 --detect_adapter_for_pe --out1 &quot;${sample_id_normal}_trim_1.fq.gz&quot; --out2 &quot;${sample_id_normal}_trim_2.fq.gz&quot; --json &quot;${sample_id_normal}.fastp.json&quot; --html &quot;${sample_id_normal}.fastp.html&quot; --thread 12 
fastp  --in1 &quot;${r1_tumor}&quot; --in2 &quot;${r2_tumor}&quot; -q 20  -u 20 -l 40 --detect_adapter_for_pe --out1 &quot;${sample_id_tumor}_trim_1.fq.gz&quot; --out2 &quot;${sample_id_tumor}_trim_2.fq.gz&quot; --json &quot;${sample_id_tumor}.fastp.json&quot; --html &quot;${sample_id_tumor}.fastp.html&quot; --thread 12 
echo &quot;Exiting fastp&quot;
&quot;&quot;&quot;
}

I do not know how to fix this error. I checked if multiple times I'm not including FASTP process it is fine. I remove include and FASTP calling process they didn't work. So I cannot figure out what's going on.

答案1

得分: 1

当您使用each迭代遍历样本映射时，实际上是在每次迭代中尝试重用_FASTP_和_align_bwa_mem_进程。Nextflow只是在提醒，如果它们（即进程）需要被重用，它们需要以不同的名称包括（即使用模块别名）或在不同的工作流上下文中包括（即使用子工作流）。实现您想要的更好方式是使用通道和splitCSV操作符，例如：

params.samples_csv = 'input_nextflow_files.csv';
include { FASTP } from './fastp_process.nf';
workflow {
    def header = ['sampleName', 'normalR1', 'normalR2', 'tumorR1', 'tumorR2'];
    Channel
        .fromPath(params.samples_csv)
        .splitCsv(header: header)
        .multiMap { row ->
            
            def tumor_reads = tuple(file(row.tumorR1), file(row.tumorR2));
            def normal_reads = tuple(file(row.normalR1), file(row.normalR2));
            
            tumor:
                tuple(row.sampleName, tumor_reads);
            normal:
                tuple(row.sampleName, normal_reads);
        }
        .set { samples }
    FASTP(samples.tumor.mix(samples.normal))
    ...
}

或者，如果您想要更多灵活性，另一种方式是使用模块别名导入_FASTP_：

params.samples_csv = 'input_nextflow_files.csv';
include { FASTP as FASTP_TUMOR } from './fastp_process.nf';
include { FASTP as FASTP_NORMAL } from './fastp_process.nf';
workflow {
    ...
    FASTP_TUMOR(samples.tumor)
    FASTP_NORMAL(samples.normal)
    ...
}

./fastp_process.nf的内容如下：

process FASTP {
    tag { sample_id }
    input:
    tuple val(sample_id), path(reads, stageAs: 'reads/*')
    output:
    tuple val(sample_id), path("${sample_id}_trim_{1,2}.fq.gz"), emit: reads
    path "${sample_id}.fastp.json", emit: json
    path "${sample_id}.fastp.html", emit: html
    script:
    def (r1, r2) = reads
    """
    fastp \\
        --in1 "${r1}" \\
        --in2 "${r2}" \\
        -q 20 \\
        -u 20 \\
        -l 40 \\
        --detect_adapter_for_pe \\
        --out1 "${sample_id}_trim_1.fq.gz" \\
        --out2 "${sample_id}_trim_2.fq.gz" \\
        --json "${sample_id}.fastp.json" \\
        --html "${sample_id}.fastp.html" \\
        --thread {task.cpus}
    """
}

英文:

When you iterate through your sample map using each, you are effectively trying to re-use the FASTP and align_bwa_mem processes with each iteration. Nextflow just complains that if they (i.e. the processes) need to be re-used, they'll need to be included using a different name (i.e. using a module alias) or in a different workflow context (i.e. using a sub-workflow). A better way to achieve what you want is to use channels and the splitCSV operator, for example:

params.samples_csv = &#39;input_nextflow_files.csv&#39;
include { FASTP } from &#39;./fastp_process.nf&#39;
workflow {
def header = [&#39;sampleName&#39;, &#39;normalR1&#39;, &#39;normalR2&#39;, &#39;tumorR1&#39;, &#39;tumorR2&#39;]
Channel
.fromPath( params.samples_csv )
.splitCsv( header: header )
.multiMap { row -&gt;
def tumor_reads = tuple( file(row.tumorR1), file(row.tumorR2) )
def normal_reads = tuple( file(row.normalR1), file(row.normalR2) )
tumor:
tuple( row.sampleName, tumor_reads )
normal:
tuple( row.sampleName, normal_reads )
}
.set { samples }
FASTP( samples.tumor.mix( samples.normal ) )
...
}

Or if you wanted more flexibility, another way would be to import FASTP using a module alias:

params.samples_csv = &#39;input_nextflow_files.csv&#39;
include { FASTP as FASTP_TUMOR } from &#39;./fastp_process.nf&#39;
include { FASTP as FASTP_NORMAL } from &#39;./fastp_process.nf&#39;
workflow {
...
FASTP_TUMOR( samples.tumor )
FASTP_NORMAL( samples.normal )
...
}

Contents of ./fastp_process.nf:

process FASTP {
tag { sample_id }
input:
tuple val(sample_id), path(reads, stageAs: &#39;reads/*&#39;)
output:
tuple val(sample_id), path(&quot;${sample_id}_trim_{1,2}.fq.gz&quot;), emit: reads
path &quot;${sample_id}.fastp.json&quot;, emit: json
path &quot;${sample_id}.fastp.html&quot;, emit: html
script:
def (r1, r2) = reads
&quot;&quot;&quot;
fastp \\
--in1 &quot;${r1}&quot; \\
--in2 &quot;${r2}&quot; \\
-q 20 \\
-u 20 \\
-l 40 \\
--detect_adapter_for_pe \\
--out1 &quot;${sample_id}_trim_1.fq.gz&quot; \\
--out2 &quot;${sample_id}_trim_2.fq.gz&quot; \\
--json &quot;${sample_id}.fastp.json&quot; \\
--html &quot;${sample_id}.fastp.html&quot; \\
--thread {task.cpus}
&quot;&quot;&quot;
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

nextflow – spltiCSV – each element – error : 如果需要重复使用相同的组件

问题

答案1

我如何将Nextflow指向正确的Singularity二进制文件？

Groovy：如何从JSON输出中提取特定的值列表

Groovy 的 future.get() 返回 null

分组 Groovy 变量 KeySet

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。