2023年4月17日 01:47:07go评论103阅读模式

英文:

Wildcards not found

问题

I'm providing the translation for the text you provided:

所以我对通配符有点困惑。我想要通过一个扩展函数获取输入文件列表。这些文件存在，并且由其他规则使用这些“通配符”创建 - {output_name} {split_index}。

我猜只有split_index是一个“真正的”通配符，因为output_name在我的rule all中的另一个扩展中定义：

expand(["results/{output_name}_1/{output_name}_1.bed",
        "results/{output_name}_2/{output_name}_2.bed",
        "results/{output_name}.txt"],
output_name=config["output_name_prefix"])

前两个文件是使用另一条规则创建的，该规则输出了我在这里扩展中使用的相同字符串：

def get_values(wildcards):
    expand(
        "results/{output_name}_{split_index}/{output_name}_{split_index}.bed",
    output_name=wildcards.output_name, split_index=wildcards.split_index)

rule merge_file_list:
    input: 
        get_values
    output: 
        "results/{output_name}.txt"
    shell:
        "for s in {input}; do ${{s}} > {output}; done"

我得到了一个错误信息，其中写着：

错误:
  AttributeError: 'Wildcards'对象没有属性'split_index'
Wildcards:
  output_name=Homo_sapiens_ch19

这个错误是什么意思？我认为output_name不是通配符，因为文档中如下所示：

蛇妈妈到底怎么定义通配符？我有点困惑。。

英文:

So I'm a little confused about wildcards. I'd like to get a list of input files with an expand function. The files exist and are created by other rules using these "wildcards" - {output_name} {split_index}.

I guess only split_index is a "true" wildcard as output_name is defined in another expand in my rule all:

expand([&quot;results/{output_name}_1/{output_name}_1.bed&quot;,
        &quot;results/{output_name}_2/{output_name}_2.bed&quot;,
        &quot;results/{output_name}.txt&quot;],
output_name=config[&quot;output_name_prefix&quot;])

The first two files are made using another rule which outputs the same string I have in my expand here:

def get_values(wildcards):
    expand(
        &quot;results/{output_name}_{split_index}/{output_name}_{split_index}.bed&quot;,
    output_name=wildcards.output_name, split_index=wildcards.split_index)

rule merge_file_list:
    input: 
        get_values
    output: 
        &quot;results/{output_name}.txt&quot;
    shell:
        &quot;for s in {input}; do ${{s}} &gt; {output}; done&quot;

I get an error that reads :

Error:
  AttributeError: &#39;Wildcards&#39; object has no attribute &#39;split_index&#39;
Wildcards:
  output_name=Homo_sapiens_ch19

What is this error even mean I thought output_name is not a wildcard as the docu reads:

What exactly does snakemake call a Wildcard? I am confusion..

答案1

得分: 1

这是您要翻译的内容：

It is a common misconception to feel no difference between wildcards and variables in expand function.

Let's start from the expand function. It is just a Python function that has no special meaning to Snakemake. It takes a string with variables (or a list of strings), and parameters that specify how to replace these variables with values. It returns a list of strings, and then the Snakemake magic starts. Actually this function may be used in any context where a function can be used, not only in input/output sections of rules.

Wildcards are more complex and misunderstood concept (and @GiangLe) in his answer provides a good example of WRONG usage. Wildcards make rules a kind of template that can be applied to different files. The meaning of each wildcard starts in the output section, where you specify what kind of files the rule can produce. If there is a way to replace each wildcard with values in such a way that the output matches something that Snakemake wants to produce, it considers the rule as a candidate. If so, it remembers the values of the wildcards that were used for this match, and since then these values are specified for other sections of the rule, including input.

Prior to explaining the problem with your code I need to make a remark that you don't need the get_values function, as it significantly complicates your code, is not idiomatic, and is not needed at all. This code:

def get_values(wildcards):
    expand(
        "results/{output_name}_{split_index}/{output_name}_{split_index}.bed",
    output_name=wildcards.output_name, split_index=wildcards.split_index)

rule merge_file_list:
    input: 
        get_values
    output: 
        "results/{output_name}.txt"

is functionally equivalent to a simplified one:

rule merge_file_list:
    input: 
        "results/{output_name}_{split_index}/{output_name}_{split_index}.bed"
    output: 
        "results/{output_name}.txt"

Now let's consider the simplified version of code and imagine that Snakemake needs to find a way to produce a file "results/my_favorite_filename.txt". It discovers that there is a rule merge_file_list that can produce this file if the wildcard {output_name} would be substituted with the value "my_favorite_filename". Now it knows this value and is trying to specify the input of this rule: "results/my_favorite_filename_{split_index}/my_favorite_filename_{split_index}.bed". It successfully replaces the wildcard with the value, and then it discovers that the value for the wildcard {split_index} is not specified. Note that this value shall come from the output section.

There is not enough info in your question to offer you a solution to your actual problem. Maybe what you need is to use {output_name} as a wildcard and {split_index} as a variable in expand:

rule merge_file_list:
    input: 
        expand("results/{{output_name}}_{split_index}/{{output_name}}_{split_index}.bed", split_index=["1", "2"])
    output: 
        "results/{output_name}.txt"

Now regarding the wrong usage of wildcards given in the answer from @GiangLe:

rule merge_file_list:
    input:
        expand("results/{output_name}_{split_index}/{output_name}_{split_index}.bed", output_name = OUTNAME, split_index = INDEXEND)
    output:
        "results/{output_name}.txt"

NEVER EVER DO LIKE THAT! The wildcard {output_name} specified in the output is not used in the input at all! Instead of the actual value of {output_name} Snakemake uses hardcoded values from the OUTNAME variable, and you would be lucky if that value would match what you expect. That could be true if you have just a single path to create a single final output, but even in this case using Snakemake is like using a sledgehammer to crack a nut.

英文:

It is a common misconception to feel no difference between wildcards and variables in expand function.

def get_values(wildcards):
    expand(
        &quot;results/{output_name}_{split_index}/{output_name}_{split_index}.bed&quot;,
    output_name=wildcards.output_name, split_index=wildcards.split_index)

rule merge_file_list:
    input: 
        get_values
    output: 
        &quot;results/{output_name}.txt&quot;

is functionally equivalent to a simplified one:

rule merge_file_list:
    input: 
        &quot;results/{output_name}_{split_index}/{output_name}_{split_index}.bed&quot;
    output: 
        &quot;results/{output_name}.txt&quot;

There is not enough info in your question to offer you a solution to your actual problem. Maybe what you need is to use {output_name} as a wildcard and {split_index} as a variable in expand:

rule merge_file_list:
    input: 
        expand(&quot;results/{{output_name}}_{split_index}/{{output_name}}_{split_index}.bed&quot;, split_index=[&quot;1&quot;, &quot;2&quot;])
    output: 
        &quot;results/{output_name}.txt&quot;

Now regarding the wrong usage of wildcards given in the answer from @GiangLe:

rule merge_file_list:
    input:
        expand(&quot;results/{output_name}_{split_index}/{output_name}_{split_index}.bed&quot;, output_name = OUTNAME, split_index = INDEXEND)
    output:
        &quot;results/{output_name}.txt&quot;

答案2

得分: 0

I think the error is from your get_value function.

INDEXEND=[&quot;1&quot;,&quot;2&quot;]

rule all:
    input:
        expand(&quot;results/{output_name}.txt&quot;, output_name = OUTNAME)

rule create_files:
    output:
        touch(&quot;results/{output_name}_{split_index}/{output_name}_{split_index}.bed&quot;)

rule merge_file_list:
    input:
        expand(&quot;results/{output_name}_{split_index}/{output_name}_{split_index}.bed&quot;, output_name = OUTNAME, split_index = INDEXEND)
    output:
        &quot;results/{output_name}.txt&quot;
    shell:
        &quot;&quot;&quot;
        echo {input} | sed &#39;s/ /\\n/g&#39; &gt; {output}
        &quot;&quot;&quot;

英文:

I think the error is from your get_value function.

OUTNAME=[&quot;Homo_sapiens_ch19&quot;]
INDEXEND=[&quot;1&quot;,&quot;2&quot;]

rule all:
    input:
        expand(&quot;results/{output_name}.txt&quot;, output_name = OUTNAME)


rule create_files:
    output:
        touch(&quot;results/{output_name}_{split_index}/{output_name}_{split_index}.bed&quot;)

rule merge_file_list:
    input:
        expand(&quot;results/{output_name}_{split_index}/{output_name}_{split_index}.bed&quot;, output_name = OUTNAME, split_index = INDEXEND)
    output:
        &quot;results/{output_name}.txt&quot;
    shell:
        &quot;&quot;&quot;
        echo {input} | sed &#39;s/ /\\n/g&#39; &gt; {output}
        &quot;&quot;&quot;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Wildcards not found.

问题

答案1

答案2

如何将字典输入读入snakemake。

这个snakemake脚本指令为什么不起作用？

Snakemake Python包装器中未找到通配符。

如何使Snakemake通配符适用于空字符串？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论