如何为Nextflow进程指定可选输入?

huangapple go评论70阅读模式
英文:

How to specify optional inputs for nextflow processes?

问题

我是你的中文翻译,以下是代码部分的翻译:

我是初学者正在尝试为一些Python脚本创建一个小型的nextflow管道然而我遇到了一个关于处理可选输入的问题似乎无法找到解决方法我也想知道处理可选输入和参数的最佳实践是什么

```python
#!/usr/bin/env nextflow

params.out = ""
params.kml_1 = null
params.kml_2 = null
params.loc = ""
params.new_data_1 = false
params.new_data_2 = false

process getPolygons {
    input:
    tuple val(db_table), path(path_to_kml), val(new_data)
    val loc
    path path_to_outdir

    def new_data_arg = new_data ? "--new_data" : ""
    def kml_arg = (path_to_kml != null) ? "--kml $path_to_kml" : ""

    script:
    """
    python3 ${baseDir}/bin/polygon_data.py --loc $loc --db_table $db_table $kml_arg $new_data_arg --outdir $path_to_outdir
    """
}

workflow {
    outdir_ch = Channel.fromPath(params.out)
    location_ch = Channel.of(params.loc)

    tables = [
        tuple("Table1", params.kml_1 ? params.new_data_1 : null, params.new_data_1),
        tuple("Table2", params.kml_2 ? params.new_data_2: null, params.new_data_2)
    ]
    tables_ch = Channel.from(tables)

    getPolygons(tables_ch, location_ch, outdir_ch)
}

在添加可选输入之前,此代码是有效的。在这之前,tables 是一个包含表格名称的列表,没有考虑到 getPolygons 中的可选参数 path_to_kmlnew_data。它看起来是这样的:

tables = ["Table1", "Table2"]

我一直遇到以下错误:

ERROR ~ No such variable: new_data

ERROR ~ No such variable: path_to_kml

具体的错误消息取决于创建变量 new_data_argkml_arg 以及在脚本中使用它们的顺序。

尝试使用元组的方法是我最新尝试解决此问题的方式,该程序在处理可选参数 new_datapath_to_kml 时遇到问题。我之前将它们作为 getPolygons 的独立输入。问题可能是在脚本中创建变量 new_data_argkml_arg 并在调用 polygon_data.py 时使用它们而不是直接使用 new_datapath_to_kml。如果是这样,我不太确定解决方法是什么,因为根据我的需求,我需要在调用 polygon_data.py 之前对 new_datapath_to_kml 进行一些逻辑处理。


<details>
<summary>英文:</summary>

I&#39;m new to nextflow and have been trying to create a small pipeline for some python scripts I have. However, I have encountered an issue regarding optional inputs to processes that I can&#39;t seem to figure out a workaround for. I&#39;m also curious what best practices would be for optional inputs and parameters.


#!/usr/bin/env nextflow

params.out = ""
params.kml_1 = null
params.kml_2 = null
params.loc = ""
params.new_data_1 = false
params.new_data_2 = false

process getPolygons {
input:
tuple val(db_table), path(path_to_kml), val(new_data)
val loc
path path_to_outdir

def new_data_arg = new_data ? &quot;--new_data&quot; : &quot;&quot;
def kml_arg = (path_to_kml != null) ? &quot;--kml $path_to_kml&quot; : &quot;&quot;

script:
&quot;&quot;&quot;
python3 ${baseDir}/bin/polygon_data.py --loc $loc --db_table $db_table $kml_arg $new_data_arg --outdir $path_to_outdir
&quot;&quot;&quot;

}

workflow {
outdir_ch = Channel.fromPath(params.out)
location_ch = Channel.of(params.loc)

tables = [
    tuple(&quot;Table1&quot;, params.kml_1 ? params.new_data_1 : null, params.new_data_1),
    tuple(&quot;Table2&quot;, params.kml_2 ? params.new_data_2: null, params.new_data_2)
]
tables_ch = Channel.from(tables)

getPolygons(tables_ch, location_ch, outdir_ch)

}



The code worked prior to adding in the optional inputs. This was before I had made `tables` a list of tuples in order to account for the optional parameters in getPolygons: path_to_kml and new_data, instead it was:

```tables = [&quot;Table1&quot;, &quot;Table2&quot;]```

I keep running into the error

```ERROR ~ No such variable: new_data``` or ```ERROR ~ No such variable: path_to_kml```

depending on the order of creating the variables new_data_arg and kml_arg.


Trying the tuple method is the latest thing I have done to address this issue that the program has with the optional parameters new_data and path_to_kml. I previously had them as separate inputs to getPolygons. Could the issue be with creating the variables new_data_arg and kml_arg and using them in the script instead of using new_data and path_to_kml directly? If so, I&#39;m not really sure what the work around is because for my purposes, I need some logic applied to new_data and path_to_kml before adding this information when invoking polygon_data.py.



</details>


# 答案1
**得分**: 0

我已找到一个解决方案,使用了元组。首先,“ERROR ~ No such variable”问题是因为变量`new_data_arg`和`kml_arg`不在流程的脚本组件中(初学者的错误)。

接下来,我意识到这不会遍历元组,所以我能够像这样利用每个元组,将元组作为变量`tuple_info`传递,并使用空字符串而不是null作为path_to_kml,因为它是一个路径,可能会出现null的问题。所以这是我的流程的最终可行版本:

```shell
process getPolygons {
    input:
    each tuple_info
    val loc
    path path_to_outdir

    script:
    def (db_table, path_to_kml, new_data) = tuple_info
    def new_data_arg = new_data ? "--new_data" : ""
    def kml_arg = (path_to_kml != "") ? "--kml $path_to_kml" : ""

    """
    python3 ${baseDir}/bin/polygon_data.py --loc $loc --db_table $db_table $kml_arg $new_data_arg --outdir $path_to_outdir
    """
}

我也意识到,我本可以简化tables列表,因为在处理参数的初始化时,没有必要围绕params.kml_1和params.kml_2构建额外的逻辑。

tables = [
    tuple("Table1", params.kml_1, params.new_data_1),
    tuple("Table2", params.kml_2, params.new_data_2)
]
英文:

I have found a solution to this that utilized tuples. First the ERROR ~ No such variable issues were due to the variables new_data_arg and kml_arg not being inside the script component of the process (rookie mistake).

Next, I realized that this would not iterate over the tuples, so I was able to utilize each to do so passing in the tuple as the variable tuple_info like so, and used "" instead of null for the path_to_kml as it is a path and there could be issues with null. so this is the final workable version for my process:

process getPolygons {
    input:
    each tuple_info
    val loc
    path path_to_outdir

    script:
    def (db_table, path_to_kml, new_data) = tuple_info
    def new_data_arg = new_data ? &quot;--new_data&quot; : &quot;&quot;
    def kml_arg = (path_to_kml != &quot;&quot;) ? &quot;--kml $path_to_kml&quot; : &quot;&quot;

    &quot;&quot;&quot;
    python3 ${baseDir}/bin/polygon_data.py --loc $loc --db_table $db_table $kml_arg $new_data_arg --outdir $path_to_outdir
    &quot;&quot;&quot;
}

I also realize that I could have simplified the tables list as theres no reason to build extra logic surrounding params.kml_1 and params.kml_2 when the initialization of the parameters handles this.

tables = [
    tuple(&quot;Table1&quot;, params.kml_1, params.new_data_1),
    tuple(&quot;Table2&quot;, params.kml_2, params.new_data_2)
]

huangapple
  • 本文由 发表于 2023年8月5日 08:25:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76839706.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定