问题

I'm using snakemake to build a pipeline. I have a checkpoint that should produce multiple output files. These output files are later used in my rule all within expand. The thing is that I don't know the amount of files that will be produced and therefore can't specify a dataset in expand.

The files will be produced in a R-script.

Example:

rule all:
    input:
        expand(["results/{output}"],
               output=????)


checkpoint rscript:
    input:
        "foo.input"
    output:
        report("somedir/{output}"),
    script:
        "../scripts/foo.R"

Of course this is only a small part but I basically have a loop in my R-script to output multiple files in the somedir. But since I don't know how many and because they are firstly evaluated in the R script I can't set output in expand.

Maybe this is a really trivial question to some of you, or even a stupid question and there are better ways to do this. If that's the case I'd still be thankful cause I had problems understanding most of the snakemake functions because of my ability to comprehend the functions in English.

If there are more questions I'd gladly answer. (The best case for me would be to let output have names that I could specify in runtime within the R script)

(I also can't aggregate the created files in another rule because each file will show a different plot)

Edit: The main problem still seems to be that checkpoint rscript is not able to create multiple {output} files in "somedir/". The attempt with touch("rscript_finish.flag") seems to output only the svg File as "rscript_finish.flag" or seems to override "rscript_finish.flag" each time the loop in my rscript writes into snakemake@output[[1]].

英文:

The files will be produced in a R-script.

Example:

rule all:
    input:
        expand([&quot;results/{output}],
               output=????)



checkpoint rscript:
    input:
        &quot;foo.input&quot;
    output:
        report(&quot;somedir/{output}&quot;),
    script:
        &quot;../scripts/foo.R&quot;

If there are more questions I'd gladly answer. (The best case for me would be to let output have names that I could specify in runtime within the R script)

(I also can't aggregate the created files in another rule, because each file will show a different plot)

答案1

得分: 2

没有愚蠢的问题 :). 我希望我理解了，并且这实际上不是一个微不足道的问题！

def all_input(wildcards):
    checkpoints.rscript.get()  # 确保执行 checkpoint rscript
    filenames, = glob_wildcards("somedir/{filenames}.png")  # 找到 rscript 生成的所有输出文件
    return expand("somedir_cp/{fn}", fn=filenames)

rule all:
    input:
        all_input

rule add_to_report:
    input:
        "somedir/{filename}.png"
    output:
        report("somedir_cp/{filename}.png")
    shell:
        "cp {input} {output}"

checkpoint rscript:
    input:
        "foo.input"
    output:
        touch("rscript_finish.flag")
    script:
        "../scripts/foo.R"

我没有真正测试这段代码，所以我不确定它是否立即生效，但我认为逻辑是正确的。

需要解决这个问题的方法是使用额外的规则，我称之为 add_to_report。这个规则的作用是复制 rscript 的现有输出，并将其添加到报告中。rule all 的工作方式是首先调用执行 checkpoint rscript。一旦它执行完毕，就会找到它生成的所有文件。然后，它指定 rule all 需要作为输入每个 rscript 生成的文件的副本，这将由 rule add_to_report 创建，因此文件将添加到报告中。

英文:

There are no stupid questions :). I hope I understood, and it was actually not a trivial question at all!

def all_input(wildcards):
    checkpoints.rscript.get()  # make sure that checkpoint rscript is executed
    filenames, = glob_wildcards(&quot;somedir/{filenames}.png&quot;)  # find all the output_files of rscript
    return expand(&quot;somedir_cp/{fn}&quot;, fn=filenames)


rule all:
    input:
        all_input


rule add_to_report:
    input:
        &quot;somedir/{filename}.png&quot;
    output:
        report(&quot;somedir_cp/{filename}.png&quot;)
    shell:
        &quot;cp {input} {output}&quot;


checkpoint rscript:
    input:
        &quot;foo.input&quot;
    output:
        touch(&quot;rscript_finish.flag&quot;)
    script:
        &quot;../scripts/foo.R&quot;

I didn't really test the code, so I am not sure if it immediatly works, but I think the logic is correct.

The way this needs to be solved is with an extra rule, which I called add_to_report. All this rule does is make a copy of the existing output of rscript, and adds it to the report. The way rule all works is that it first calls for the execution of checkpoint rscript. Once that one is executed it finds all the files it generated. Then it says that rule all needs as input the copy of each file rscript generated, which will be made by rule add_to_report, and thus the files are added to the report.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Snakemake中使用checkpoint实现多个输出的语法

问题

答案1

Azure数据工厂表达式自动创建和截断表

可以从SQL Server调用Azure Pipeline吗？

snakemake目标规则作为Python代码的变量

在text/html模板包中，“range”操作和“pipeline”概念的解释。Golang

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论