Wildcards not found.

huangapple go评论103阅读模式
英文:

Wildcards not found

问题

I'm providing the translation for the text you provided:

所以我对通配符有点困惑。我想要通过一个扩展函数获取输入文件列表。这些文件存在,并且由其他规则使用这些“通配符”创建 - {output_name} {split_index}

我猜只有split_index是一个“真正的”通配符,因为output_name在我的rule all中的另一个扩展中定义:

expand(["results/{output_name}_1/{output_name}_1.bed",
        "results/{output_name}_2/{output_name}_2.bed",
        "results/{output_name}.txt"],
output_name=config["output_name_prefix"])

前两个文件是使用另一条规则创建的,该规则输出了我在这里扩展中使用的相同字符串:

def get_values(wildcards):
    expand(
        "results/{output_name}_{split_index}/{output_name}_{split_index}.bed",
    output_name=wildcards.output_name, split_index=wildcards.split_index)
rule merge_file_list:
    input: 
        get_values
    output: 
        "results/{output_name}.txt"
    shell:
        "for s in {input}; do ${{s}} > {output}; done"

我得到了一个错误信息,其中写着:

错误:
  AttributeError: 'Wildcards'对象没有属性'split_index'
Wildcards:
  output_name=Homo_sapiens_ch19

这个错误是什么意思?我认为output_name不是通配符,因为文档中如下所示:

Wildcards not found.

蛇妈妈到底怎么定义通配符?我有点困惑。。

英文:

So I'm a little confused about wildcards. I'd like to get a list of input files with an expand function. The files exist and are created by other rules using these "wildcards" - {output_name} {split_index}.

I guess only split_index is a "true" wildcard as output_name is defined in another expand in my rule all:

expand(["results/{output_name}_1/{output_name}_1.bed",
        "results/{output_name}_2/{output_name}_2.bed",
        "results/{output_name}.txt"],
output_name=config["output_name_prefix"])

The first two files are made using another rule which outputs the same string I have in my expand here:

def get_values(wildcards):
    expand(
        "results/{output_name}_{split_index}/{output_name}_{split_index}.bed",
    output_name=wildcards.output_name, split_index=wildcards.split_index)
rule merge_file_list:
    input: 
        get_values
    output: 
        "results/{output_name}.txt"
    shell:
        "for s in {input}; do ${{s}} > {output}; done"

I get an error that reads :

Error:
  AttributeError: 'Wildcards' object has no attribute 'split_index'
Wildcards:
  output_name=Homo_sapiens_ch19

What is this error even mean I thought output_name is not a wildcard as the docu reads:

Wildcards not found.

What exactly does snakemake call a Wildcard? I am confusion..

答案1

得分: 1

这是您要翻译的内容:

It is a common misconception to feel no difference between wildcards and variables in expand function.

Let's start from the expand function. It is just a Python function that has no special meaning to Snakemake. It takes a string with variables (or a list of strings), and parameters that specify how to replace these variables with values. It returns a list of strings, and then the Snakemake magic starts. Actually this function may be used in any context where a function can be used, not only in input/output sections of rules.

Wildcards are more complex and misunderstood concept (and @GiangLe) in his answer provides a good example of WRONG usage. Wildcards make rules a kind of template that can be applied to different files. The meaning of each wildcard starts in the output section, where you specify what kind of files the rule can produce. If there is a way to replace each wildcard with values in such a way that the output matches something that Snakemake wants to produce, it considers the rule as a candidate. If so, it remembers the values of the wildcards that were used for this match, and since then these values are specified for other sections of the rule, including input.

Prior to explaining the problem with your code I need to make a remark that you don't need the get_values function, as it significantly complicates your code, is not idiomatic, and is not needed at all. This code:

def get_values(wildcards):
    expand(
        "results/{output_name}_{split_index}/{output_name}_{split_index}.bed",
    output_name=wildcards.output_name, split_index=wildcards.split_index)

rule merge_file_list:
    input: 
        get_values
    output: 
        "results/{output_name}.txt"

is functionally equivalent to a simplified one:

rule merge_file_list:
    input: 
        "results/{output_name}_{split_index}/{output_name}_{split_index}.bed"
    output: 
        "results/{output_name}.txt"

Now let's consider the simplified version of code and imagine that Snakemake needs to find a way to produce a file "results/my_favorite_filename.txt". It discovers that there is a rule merge_file_list that can produce this file if the wildcard {output_name} would be substituted with the value "my_favorite_filename". Now it knows this value and is trying to specify the input of this rule: "results/my_favorite_filename_{split_index}/my_favorite_filename_{split_index}.bed". It successfully replaces the wildcard with the value, and then it discovers that the value for the wildcard {split_index} is not specified. Note that this value shall come from the output section.

There is not enough info in your question to offer you a solution to your actual problem. Maybe what you need is to use {output_name} as a wildcard and {split_index} as a variable in expand:

rule merge_file_list:
    input: 
        expand("results/{{output_name}}_{split_index}/{{output_name}}_{split_index}.bed", split_index=["1", "2"])
    output: 
        "results/{output_name}.txt"

Now regarding the wrong usage of wildcards given in the answer from @GiangLe:

rule merge_file_list:
    input:
        expand("results/{output_name}_{split_index}/{output_name}_{split_index}.bed", output_name = OUTNAME, split_index = INDEXEND)
    output:
        "results/{output_name}.txt"

NEVER EVER DO LIKE THAT! The wildcard {output_name} specified in the output is not used in the input at all! Instead of the actual value of {output_name} Snakemake uses hardcoded values from the OUTNAME variable, and you would be lucky if that value would match what you expect. That could be true if you have just a single path to create a single final output, but even in this case using Snakemake is like using a sledgehammer to crack a nut.

英文:

It is a common misconception to feel no difference between wildcards and variables in expand function.

Let's start from the expand function. It is just a Python function that has no special meaning to Snakemake. It takes a string with variables (or a list of strings), and parameters that specify how to replace these variables with values. It returns a list of strings, and then the Snakemake magic starts. Actually this function may be used in any context where a function can be used, not only in input/output sections of rules.

Wildcards are more complex and misunderstood concept (and @GiangLe) in his answer provides a good example of WRONG usage. Wildcards make rules a kind of template that can be applied to different files. The meaning of each wildcard starts in the output section, where you specify what kind of files the rule can produce. If there is a way to replace each wildcard with values in such a way that the output matches something that Snakemake wants to produce, it considers the rule as a candidate. If so, it remembers the values of the wildcards that were used for this match, and since then these values are specified for other sections of the rule, including input.

Prior to explaining the problem with your code I need to make a remark that you don't need the get_values function, as it significantly complicates your code, is not idiomatic, and is not needed at all. This code:

def get_values(wildcards):
    expand(
        "results/{output_name}_{split_index}/{output_name}_{split_index}.bed",
    output_name=wildcards.output_name, split_index=wildcards.split_index)

rule merge_file_list:
    input: 
        get_values
    output: 
        "results/{output_name}.txt"

is functionally equivalent to a simplified one:

rule merge_file_list:
    input: 
        "results/{output_name}_{split_index}/{output_name}_{split_index}.bed"
    output: 
        "results/{output_name}.txt"

Now let's consider the simplified version of code and imagine that Snakemake needs to find a way to produce a file "results/my_favorite_filename.txt". It discovers that there is a rule merge_file_list that can produce this file if the wildcard {output_name} would be substituted with the value "my_favorite_filename". Now it knows this value and is trying to specify the input of this rule: "results/my_favorite_filename_{split_index}/my_favorite_filename_{split_index}.bed". It successfully replaces the wildcard with the value, and then it discovers that the value for the wildcard {split_index} is not specified. Note that this value shall come from the output section.

There is not enough info in your question to offer you a solution to your actual problem. Maybe what you need is to use {output_name} as a wildcard and {split_index} as a variable in expand:

rule merge_file_list:
    input: 
        expand("results/{{output_name}}_{split_index}/{{output_name}}_{split_index}.bed", split_index=["1", "2"])
    output: 
        "results/{output_name}.txt"

Now regarding the wrong usage of wildcards given in the answer from @GiangLe:

rule merge_file_list:
    input:
        expand("results/{output_name}_{split_index}/{output_name}_{split_index}.bed", output_name = OUTNAME, split_index = INDEXEND)
    output:
        "results/{output_name}.txt"

NEVER EVER DO LIKE THAT! The wildcard {output_name} specified in the output is not used in the input at all! Instead of the actual value of {output_name} Snakemake uses hardcoded values from the OUTNAME variable, and you would be lucky if that value would match what you expect. That could be true if you have just a single path to create a single final output, but even in this case using Snakemake is like using a sledgehammer to crack a nut.

答案2

得分: 0

I think the error is from your get_value function.

INDEXEND=["1","2"]

rule all:
    input:
        expand("results/{output_name}.txt", output_name = OUTNAME)

rule create_files:
    output:
        touch("results/{output_name}_{split_index}/{output_name}_{split_index}.bed")

rule merge_file_list:
    input:
        expand("results/{output_name}_{split_index}/{output_name}_{split_index}.bed", output_name = OUTNAME, split_index = INDEXEND)
    output:
        "results/{output_name}.txt"
    shell:
        """
        echo {input} | sed 's/ /\\n/g' > {output}
        """
英文:

I think the error is from your get_value function.

OUTNAME=["Homo_sapiens_ch19"]
INDEXEND=["1","2"]

rule all:
    input:
        expand("results/{output_name}.txt", output_name = OUTNAME)


rule create_files:
    output:
        touch("results/{output_name}_{split_index}/{output_name}_{split_index}.bed")

rule merge_file_list:
    input:
        expand("results/{output_name}_{split_index}/{output_name}_{split_index}.bed", output_name = OUTNAME, split_index = INDEXEND)
    output:
        "results/{output_name}.txt"
    shell:
        """
        echo {input} | sed 's/ /\\n/g' > {output}
        """

huangapple
  • 本文由 发表于 2023年4月17日 01:47:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76029405.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定