2023年3月8日 17:00:24go评论68阅读模式

英文:

How to make Snakemake run a rule once for all matching outputs, not once for each wildcard match

问题

在Snakemake工作流中，出于效率原因，我想要针对通配符匹配的_列表_运行一次规则，而不是每次匹配都运行一次。

在Snakemake中，要实现这个目标，可以使用expand函数和dynamic输出。下面是一个示例代码，用于运行rule produce_all_csv一次以处理所有样本：

from snakemake import expand

rule all:
    input:
        expand("out_{sample}.csv", sample=["1", "2", "3"])

rule produce_all_csv:
    input:
        expand("in_{sample}.csv", sample=["1", "2", "3"])
    output:
        dynamic("out_{sample}.csv")
    shell:
        "tool --inputs {input} --outputs {output}"

这个示例中，expand函数用于生成样本的输入文件列表，然后dynamic输出用于告诉Snakemake，out_{sample}.csv是一个动态输出，只需运行一次rule produce_all_csv来处理所有样本。

假设工具的API如下所示：

tool --inputs input_1.csv,input_2.csv --outputs output_1.csv,output_2.csv

这种方法将一次性处理所有样本，而不是每个样本都运行一次。

英文:

In a Snakemake workflow, for efficiency reasons, I want to run a rule once for the list of wildcard matches - rather than once for each match.

What's the idiomatic way of doing this in Snakemake?

This is a minimal starter code that does not what I want, as it would call rule produce_all_csvs once for each of the required outputs (here 3 times) rather than the desired one time.

rule all:
    input:
        &quot;out_1.csv&quot;,
        &quot;out_2.csv&quot;,
        &quot;out_3.csv&quot;,


rule produce_all_csv:
    &quot;&quot;&quot;
    This rule should be called _once_ for _all_ samples
    Not once per sample
    &quot;&quot;&quot;
    input:
        &quot;in_{sample}.csv&quot;,
    output:
        &quot;out_{sample}.csv&quot;,
    shell:
        &quot;&quot;&quot;
        # Placeholder for a real command
        # that takes a list of input files
        # and produces a list of output file
        &quot;&quot;&quot;

For concreteness, assume the tool has this API:

tool --inputs input_1.csv,input_2.csv --outputs output_1.csv,output_2.csv

This question is inspired by https://stackoverflow.com/questions/75603548/how-to-escape-missingoutputexception-while-running-a-for-loop-in-a-rule-in-snake

答案1

得分: 1

这个呢？

SAMPLES = ['1', '2', '3']

rule all:
input:
"out_1.csv",
"out_2.csv",
"out_3.csv",

rule produce_all_csv:
input:
csv=["in_{sample}.csv" for sample in SAMPLES],
output:
csv=["out_{sample}.csv" for sample in SAMPLES],
params:
in_csv=lambda wc, input: ','.join(input.csv),
out_csv=lambda wc, output: ','.join(output.csv),
shell:
"""
tool --inputs {params.in_csv} --outputs {params.out_csv}
"""


你可以考虑使用 `expand` 函数来代替列表推导式。

英文:

What about this?

SAMPLES = [&#39;1&#39;, &#39;2&#39;, &#39;3&#39;]

rule all:
    input:
        &quot;out_1.csv&quot;,
        &quot;out_2.csv&quot;,
        &quot;out_3.csv&quot;,


rule produce_all_csv:
    input:
        csv=[f&quot;in_{sample}.csv&quot; for sample in SAMPLES],
    output:
        csv=[f&quot;out_{sample}.csv&quot; for sample in SAMPLES],
    params:
        in_csv=lambda wc, input: &#39;,&#39;.join(input.csv),
        out_csv=lambda wc, output: &#39;,&#39;.join(output.csv),
    shell:
        r&quot;&quot;&quot;
        tool --inputs {params.in_csv} --outputs {params.out_csv}
        &quot;&quot;&quot;

You could probably use the expand function instead of the list comprehensions.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to make Snakemake run a rule once for all matching outputs, not once for each wildcard match

问题

答案1

在Python中找到树上的所有唯一单一分支。

为什么在查询整数列中的浮点数据时会出现慢查询和资源争用？

矩阵与其转置之间的乘法不是对称的且不是半正定的。

有办法让pandas的pd.crosstab默认包含margins=True吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论