2023年7月11日 00:24:37go评论86阅读模式

英文:

iterating through a file in a Nextflow process

问题

我正在使用nextflow创建一个流水线，并且在其中一个过程中遇到了一些问题。

我有一个过程，它以2个普通文件（output.kraken和$sequences）以及一个字符串（例如“Aspergillus”）作为输入。

我还有一个文件'fungal_species.txt'，其中包含多行内容，我希望迭代该文件，并在每一行上启动该过程。

我尝试了这样的方式：

process fungal_reads_extraction {
     publishDir("${params.extraction_output}", mode: 'copy') 
     
     input:
     path namesspecies
     
     output:
     path "*", emit: reads_extracted_out
     
     script:
     """
   while read -r species_name; do
//Extract lines from the Kraken file where the third word matches the species name
     awk -F'\t' -v "$species_name" 'BEGIN {OFS="\t"} $3 ~ "$species_name" {print}' output.kraken > "${species_name}_lines.txt"
//Extract accessions from species lines
     awk -F'\t' '{print $2}' "${species_name}_lines.txt" > "${species_name}_accessions.txt"
//Add "@" symbol to the beginning of each line in the accession file
     awk '{print "@" $0}' "${species_name}_accessions.txt" > "${species_name}_full_accessions.txt"
//Extract reads assigned to the species
     cat $sequences | awk 'NR==FNR {accessions[$1]=1; next} $1 in accessions {print; getline; print; getline; print; getline; print}' "${species_name}_full_accessions.txt" - > "${species_name}_reads.fastq"
//Cleanup intermediate files
     rm "${species_name}_lines.txt" "${species_name}_accessions.txt" "${species_name}_full_accessions.txt"
   done < fungal_species.txt
     """
}

在我看来，使用while循环，并将行命名为species_name非常合理。但是当我尝试运行流水线时，在该过程中遇到一个错误，说species_name未知！！！这似乎非常奇怪，有人能帮我吗？也许我忽略了非常重要的东西

ERROR ~ Error executing process > 'fungal_reads_extraction (1)'
Caused by:
  No such variable: species_name -- Check script 'pipeline.nf' at line: 193

提前感谢！祝你有个愉快的一天！

英文:

I am working with nextflow to create a pipeline, and I am facing some problems in one of the processes.

I have a process that takes as input 2 normal files (output.kraken, and $sequences) and a string ("Aspergillus" for example)

I have another file 'fungal_species.txt) that contain multiples lines, and I want to iterate this file and launch the process on every line of them.

I tried that:

process fungal_reads_extraction {
     publishDir(&quot;${params.extraction_output}&quot; , mode: &#39;copy&#39;) 
     
     input:
     path namesspecies
     
     output:
     path &quot;*&quot; , emit: reads_extracted_out
     
     script:
     &quot;&quot;&quot;
   while read -r species_name; do
//Extract lines from the Kraken file where the third word matches the species name
     awk -F&#39;\t&#39; -v &quot;$species_name&quot; &#39;BEGIN {OFS=&quot;\t&quot;} \$3 ~ &quot;$species_name&quot; {print}&#39; output.kraken &gt; &quot;${species_name}_lines.txt&quot;
//Extract accessions from species lines
     awk -F&#39;\t&#39; &#39;{print \$2}&#39; &quot;${species_name}_lines.txt&quot; &gt; &quot;${species_name}_accessions.txt&quot;
//Add &quot;@&quot; symbol to the beginning of each line in the accession file
     awk &#39;{print &quot;@&quot; \$0}&#39; &quot;${species_name}_accessions.txt&quot; &gt; &quot;${species_name}_full_accessions.txt&quot;
//Extract reads assigned to the species
     cat $sequences | awk &#39;NR==FNR {accessions[\$1]=1; next} \$1 in accessions {print; getline; print; getline; print; getline; print}&#39; &quot;${species_name}_full_accessions.txt&quot; - &gt; &quot;${species_name}_reads.fastq&quot;
//Cleanup intermediate files
     rm &quot;${species_name}_lines.txt&quot; &quot;${species_name}_accessions.txt&quot; &quot;${species_name}_full_accessions.txt&quot;
   done &lt; fungal_species.txt
     &quot;&quot;&quot;
}

It seemed to me very logic to use while, and mention the line as species_name.
But when I try to run the pipeline, I met an error in that process saying that the species_name is uknown !!! It seems very bizarre, can anyone help me please, maybe I am ignoring something very important

ERROR ~ Error executing process &gt; &#39;fungal_reads_extraction (1)&#39;
Caused by:
  No such variable: species_name -- Check script &#39;pipeline.nf&#39; at line: 193

Thank you in advance !
have a good day !

答案1

得分: 2

$species_name中的$不是一个Nextflow变量，而是一个SHELL变量。必须转义以告诉Nextflow它不是一个Nextflow变量。awk -F'\t' -v "\$species_name" 'BEGIN {...

此外，最好的方式是根据真菌物种拆分并并行处理每个物种。类似于：

species_ch = Channel.fromPath(params.path_to_fungal_species).splitText().map{it.trim()}
(...)
process fungal_reads_extraction {
     input:
     val(one_name)
     (...)

英文:

$ in $species_name is not a nextflow variable but a SHELL variable. It must be escaped to tell nextflow that it's not a nextflow variable. awk -F'\t' -v "\$species_name" 'BEGIN {..

Futhermore, best way would be to split your fungal_species and parallelize per species. Something like:

species_ch = Channel.fromPath(params.path_to_fungal_species).splitText().map{it.trim()}
(...)
process fungal_reads_extraction {
     input:
     val(one_name)
     (...)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Nextflow流程中迭代遍历文件。

问题

答案1

匹配正则表达式元字符字面上

检查一些 Debian/Ubuntu 的设置，输出应与 testssh 类似。

Unix shell脚本根据另一个CSV文件中的标题选择CSV文件中的列。

Access root user Python Anywhere bash console.

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。