英文:
path not being detected by Nextflow
问题
我对 nf-core/nextflow 是新手,不用说文档可能并没有反映实际实现。但我在下面定义了基本的流程:
nextflow.enable.dsl=2
process RUNBLAST{
input:
val thr
path query
path db
path output
output:
path output
script:
"""
blastn -query ${query} -db ${db} -out ${output} -num_threads ${thr}
"""
}
workflow{
//println "我想要对 $params.query 使用 $params.threads 个CPU,将其 BLAST 到 $params.dbDir/$params.dbName 并输出到 $params.outdir"
RUNBLAST(params.threads,params.query,params.dbDir, params.output)
}
然后我使用以下命令执行流程:
nextflow run main.nf --query test2.fa --dbDir blast/blastDB
然后我得到以下错误:
N E X T F L O W ~ version 22.10.6
启动 `main.nf` [dreamy_hugle] DSL2 - 修订版: c388cf8f31
执行进程时出错 > 'RUNBLAST'
执行进程时出错 > 'RUNBLAST'
导致原因:
不是一个有效的路径值: 'test2.fa'
提示: 你可以通过切换到流程工作目录并输入 bash .command.run 命令来复制此问题。
我知道 test2.fa
存在于当前目录:
(nfcore) MN:nf-core-basicblast jraygozagaray$ ls
CHANGELOG.md conf other.nf
CITATIONS.md docs pyproject.toml
CODE_OF_CONDUCT.md lib subworkflows
LICENSE main.nf test.fa
README.md modules test2.fa
assets modules.json work
bin nextflow.config workflows
blast nextflow_schema.json
我也尝试过将 path
替换为 file
,但那已被弃用并引发了其他类型的错误。
了解如何修复这个问题将有助于我开始构建流程。
难道 nextflow 不应该将文件复制到执行路径吗?
谢谢
<details>
<summary>英文:</summary>
i'm new to nf-core/nextflow and needless to say the documentation does not reflect what might be actually implemented. But i'm defining the basic pipeline below:
nextflow.enable.dsl=2
process RUNBLAST{
input:
val thr
path query
path db
path output
output:
path output
script:
"""
blastn -query ${query} -db ${db} -out ${output} -num_threads ${thr}
"""
}
workflow{
//println "I want to BLAST $params.query to $params.dbDir/$params.dbName using $params.threads CPUs and output it to $params.outdir"
RUNBLAST(params.threads,params.query,params.dbDir, params.output)
}
Then i'm executing the pipeline with
```nextflow run main.nf --query test2.fa --dbDir blast/blastDB```
Then i get the following error:
N E X T F L O W ~ version 22.10.6
Launching main.nf
[dreamy_hugle] DSL2 - revision: c388cf8f31
Error executing process > 'RUNBLAST'
Error executing process > 'RUNBLAST'
Caused by:
Not a valid path value: 'test2.fa'
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
I know test2.fa exists in the current directory:
(nfcore) MN:nf-core-basicblast jraygozagaray$ ls
CHANGELOG.md conf other.nf
CITATIONS.md docs pyproject.toml
CODE_OF_CONDUCT.md lib subworkflows
LICENSE main.nf test.fa
README.md modules test2.fa
assets modules.json work
bin nextflow.config workflows
blast nextflow_schema.json
I also tried with "file" instead of path but that is deprecated and raises other kind of errors.
It'll be helpful to know how to fix this to get myself started with the pipeline building process.
Shouldn't nextflow copy the file to the execution path?
Thanks
</details>
# 答案1
**得分**: 1
由于`params.query`实际上不是`path`值,所以出现了上述错误。它可能只是一个简单的字符串或GString。解决方法是提供一个`file`对象,例如:
```groovy
workflow {
query = file(params.query)
BLAST( query, ... )
}
请注意,当使用简单值调用进程时,会隐式创建一个value channel,就像上面的file
对象一样。如果您需要能够对多个查询文件进行BLAST,您将需要一个queue channel,可以使用fromPath
工厂方法创建,例如:
params.query = "${baseDir}/data/*.fa"
params.db = "${baseDir}/blastdb/nt"
params.outdir = './results'
db_name = file(params.db).name
db_path = file(params.db).parent
process BLAST {
publishDir(
path: "{params.outdir}/blast",
mode: 'copy',
)
input:
tuple val(query_id), path(query)
path db
output:
tuple val(query_id), path("${query_id}.out")
"""
blastn \\
-num_threads ${task.cpus} \\
-query "${query}" \\
-db "${db}/${db_name}" \\
-out "${query_id}.out"
"""
}
workflow{
Channel
.fromPath( params.query )
.map { file -> tuple(file.baseName, file) }
.set { query_ch }
BLAST( query_ch, db_path )
}
请注意,通常指定线程/处理器数量的方法是使用cpus指令,可以在您的nextflow.config
中使用process selector进行配置,例如:
process {
withName: BLAST {
cpus = 4
}
}
英文:
You get the above error because params.query
is not actually a path
value. It's probably just a simple String or GString. The solution is to instead supply a file
object, for example:
workflow {
query = file(params.query)
BLAST( query, ... )
}
Note that a value channel is implicitly created by a process when it is invoked with a simple value, like the above file
object. If you need to be able to BLAST multiple query files, you'll instead need a queue channel, which can be created using the fromPath
factory method, for example:
params.query = "${baseDir}/data/*.fa"
params.db = "${baseDir}/blastdb/nt"
params.outdir = './results'
db_name = file(params.db).name
db_path = file(params.db).parent
process BLAST {
publishDir(
path: "{params.outdir}/blast",
mode: 'copy',
)
input:
tuple val(query_id), path(query)
path db
output:
tuple val(query_id), path("${query_id}.out")
"""
blastn \\
-num_threads ${task.cpus} \\
-query "${query}" \\
-db "${db}/${db_name}" \\
-out "${query_id}.out"
"""
}
workflow{
Channel
.fromPath( params.query )
.map { file -> tuple(file.baseName, file) }
.set { query_ch }
BLAST( query_ch, db_path )
}
Note that the usual way to specify the number of threads/cpus is using cpus directive, which can be configured using a process selector in your nextflow.config
. For example:
process {
withName: BLAST {
cpus = 4
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论