英文:
iterating through a file in a Nextflow process
问题
我正在使用nextflow创建一个流水线,并且在其中一个过程中遇到了一些问题。
我有一个过程,它以2个普通文件(output.kraken和$sequences)以及一个字符串(例如“Aspergillus”)作为输入。
我还有一个文件'fungal_species.txt',其中包含多行内容,我希望迭代该文件,并在每一行上启动该过程。
我尝试了这样的方式:
process fungal_reads_extraction {
publishDir("${params.extraction_output}", mode: 'copy')
input:
path namesspecies
output:
path "*", emit: reads_extracted_out
script:
"""
while read -r species_name; do
//Extract lines from the Kraken file where the third word matches the species name
awk -F'\t' -v "$species_name" 'BEGIN {OFS="\t"} $3 ~ "$species_name" {print}' output.kraken > "${species_name}_lines.txt"
//Extract accessions from species lines
awk -F'\t' '{print $2}' "${species_name}_lines.txt" > "${species_name}_accessions.txt"
//Add "@" symbol to the beginning of each line in the accession file
awk '{print "@" $0}' "${species_name}_accessions.txt" > "${species_name}_full_accessions.txt"
//Extract reads assigned to the species
cat $sequences | awk 'NR==FNR {accessions[$1]=1; next} $1 in accessions {print; getline; print; getline; print; getline; print}' "${species_name}_full_accessions.txt" - > "${species_name}_reads.fastq"
//Cleanup intermediate files
rm "${species_name}_lines.txt" "${species_name}_accessions.txt" "${species_name}_full_accessions.txt"
done < fungal_species.txt
"""
}
在我看来,使用while循环,并将行命名为species_name非常合理。但是当我尝试运行流水线时,在该过程中遇到一个错误,说species_name未知!!!这似乎非常奇怪,有人能帮我吗?也许我忽略了非常重要的东西
ERROR ~ Error executing process > 'fungal_reads_extraction (1)'
Caused by:
No such variable: species_name -- Check script 'pipeline.nf' at line: 193
提前感谢!祝你有个愉快的一天!
英文:
I am working with nextflow to create a pipeline, and I am facing some problems in one of the processes.
I have a process that takes as input 2 normal files (output.kraken, and $sequences) and a string ("Aspergillus" for example)
I have another file 'fungal_species.txt) that contain multiples lines, and I want to iterate this file and launch the process on every line of them.
I tried that:
process fungal_reads_extraction {
publishDir("${params.extraction_output}" , mode: 'copy')
input:
path namesspecies
output:
path "*" , emit: reads_extracted_out
script:
"""
while read -r species_name; do
//Extract lines from the Kraken file where the third word matches the species name
awk -F'\t' -v "$species_name" 'BEGIN {OFS="\t"} \$3 ~ "$species_name" {print}' output.kraken > "${species_name}_lines.txt"
//Extract accessions from species lines
awk -F'\t' '{print \$2}' "${species_name}_lines.txt" > "${species_name}_accessions.txt"
//Add "@" symbol to the beginning of each line in the accession file
awk '{print "@" \$0}' "${species_name}_accessions.txt" > "${species_name}_full_accessions.txt"
//Extract reads assigned to the species
cat $sequences | awk 'NR==FNR {accessions[\$1]=1; next} \$1 in accessions {print; getline; print; getline; print; getline; print}' "${species_name}_full_accessions.txt" - > "${species_name}_reads.fastq"
//Cleanup intermediate files
rm "${species_name}_lines.txt" "${species_name}_accessions.txt" "${species_name}_full_accessions.txt"
done < fungal_species.txt
"""
}
It seemed to me very logic to use while, and mention the line as species_name.
But when I try to run the pipeline, I met an error in that process saying that the species_name is uknown !!! It seems very bizarre, can anyone help me please, maybe I am ignoring something very important
ERROR ~ Error executing process > 'fungal_reads_extraction (1)'
Caused by:
No such variable: species_name -- Check script 'pipeline.nf' at line: 193
Thank you in advance !
have a good day !
答案1
得分: 2
$species_name
中的$
不是一个Nextflow变量,而是一个SHELL变量。必须转义以告诉Nextflow它不是一个Nextflow变量。awk -F'\t' -v "\$species_name" 'BEGIN {...
此外,最好的方式是根据真菌物种拆分并并行处理每个物种。类似于:
species_ch = Channel.fromPath(params.path_to_fungal_species).splitText().map{it.trim()}
(...)
process fungal_reads_extraction {
input:
val(one_name)
(...)
英文:
$
in $species_name
is not a nextflow variable but a SHELL variable. It must be escaped to tell nextflow that it's not a nextflow variable. awk -F'\t' -v "\$species_name" 'BEGIN {..
Futhermore, best way would be to split your fungal_species and parallelize per species. Something like:
species_ch = Channel.fromPath(params.path_to_fungal_species).splitText().map{it.trim()}
(...)
process fungal_reads_extraction {
input:
val(one_name)
(...)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论