英文:
Wildcards not found
问题
I'm providing the translation for the text you provided:
所以我对通配符有点困惑。我想要通过一个扩展函数获取输入文件列表。这些文件存在,并且由其他规则使用这些“通配符”创建 - {output_name} {split_index}
。
我猜只有split_index
是一个“真正的”通配符,因为output_name
在我的rule all
中的另一个扩展中定义:
expand(["results/{output_name}_1/{output_name}_1.bed",
"results/{output_name}_2/{output_name}_2.bed",
"results/{output_name}.txt"],
output_name=config["output_name_prefix"])
前两个文件是使用另一条规则创建的,该规则输出了我在这里扩展中使用的相同字符串:
def get_values(wildcards):
expand(
"results/{output_name}_{split_index}/{output_name}_{split_index}.bed",
output_name=wildcards.output_name, split_index=wildcards.split_index)
rule merge_file_list:
input:
get_values
output:
"results/{output_name}.txt"
shell:
"for s in {input}; do ${{s}} > {output}; done"
我得到了一个错误信息,其中写着:
错误:
AttributeError: 'Wildcards'对象没有属性'split_index'
Wildcards:
output_name=Homo_sapiens_ch19
这个错误是什么意思?我认为output_name
不是通配符,因为文档中如下所示:
蛇妈妈到底怎么定义通配符?我有点困惑。。
英文:
So I'm a little confused about wildcards. I'd like to get a list of input files with an expand function. The files exist and are created by other rules using these "wildcards" - {output_name} {split_index}
.
I guess only split_index is a "true" wildcard as output_name is defined in another expand in my rule all
:
expand(["results/{output_name}_1/{output_name}_1.bed",
"results/{output_name}_2/{output_name}_2.bed",
"results/{output_name}.txt"],
output_name=config["output_name_prefix"])
The first two files are made using another rule which outputs the same string I have in my expand here:
def get_values(wildcards):
expand(
"results/{output_name}_{split_index}/{output_name}_{split_index}.bed",
output_name=wildcards.output_name, split_index=wildcards.split_index)
rule merge_file_list:
input:
get_values
output:
"results/{output_name}.txt"
shell:
"for s in {input}; do ${{s}} > {output}; done"
I get an error that reads :
Error:
AttributeError: 'Wildcards' object has no attribute 'split_index'
Wildcards:
output_name=Homo_sapiens_ch19
What is this error even mean I thought output_name is not a wildcard as the docu reads:
What exactly does snakemake call a Wildcard? I am confusion..
答案1
得分: 1
这是您要翻译的内容:
It is a common misconception to feel no difference between wildcards and variables in expand
function.
Let's start from the expand
function. It is just a Python function that has no special meaning to Snakemake. It takes a string with variables (or a list of strings), and parameters that specify how to replace these variables with values. It returns a list of strings, and then the Snakemake magic starts. Actually this function may be used in any context where a function can be used, not only in input
/output
sections of rules.
Wildcards are more complex and misunderstood concept (and @GiangLe) in his answer provides a good example of WRONG usage. Wildcards make rules a kind of template that can be applied to different files. The meaning of each wildcard starts in the output
section, where you specify what kind of files the rule can produce. If there is a way to replace each wildcard with values in such a way that the output matches something that Snakemake wants to produce, it considers the rule as a candidate. If so, it remembers the values of the wildcards that were used for this match, and since then these values are specified for other sections of the rule, including input
.
Prior to explaining the problem with your code I need to make a remark that you don't need the get_values
function, as it significantly complicates your code, is not idiomatic, and is not needed at all. This code:
def get_values(wildcards):
expand(
"results/{output_name}_{split_index}/{output_name}_{split_index}.bed",
output_name=wildcards.output_name, split_index=wildcards.split_index)
rule merge_file_list:
input:
get_values
output:
"results/{output_name}.txt"
is functionally equivalent to a simplified one:
rule merge_file_list:
input:
"results/{output_name}_{split_index}/{output_name}_{split_index}.bed"
output:
"results/{output_name}.txt"
Now let's consider the simplified version of code and imagine that Snakemake needs to find a way to produce a file "results/my_favorite_filename.txt"
. It discovers that there is a rule merge_file_list
that can produce this file if the wildcard {output_name}
would be substituted with the value "my_favorite_filename"
. Now it knows this value and is trying to specify the input of this rule: "results/my_favorite_filename_{split_index}/my_favorite_filename_{split_index}.bed"
. It successfully replaces the wildcard with the value, and then it discovers that the value for the wildcard {split_index}
is not specified. Note that this value shall come from the output
section.
There is not enough info in your question to offer you a solution to your actual problem. Maybe what you need is to use {output_name}
as a wildcard and {split_index}
as a variable in expand
:
rule merge_file_list:
input:
expand("results/{{output_name}}_{split_index}/{{output_name}}_{split_index}.bed", split_index=["1", "2"])
output:
"results/{output_name}.txt"
Now regarding the wrong usage of wildcards given in the answer from @GiangLe:
rule merge_file_list:
input:
expand("results/{output_name}_{split_index}/{output_name}_{split_index}.bed", output_name = OUTNAME, split_index = INDEXEND)
output:
"results/{output_name}.txt"
NEVER EVER DO LIKE THAT! The wildcard {output_name}
specified in the output
is not used in the input
at all! Instead of the actual value of {output_name}
Snakemake uses hardcoded values from the OUTNAME
variable, and you would be lucky if that value would match what you expect. That could be true if you have just a single path to create a single final output, but even in this case using Snakemake is like using a sledgehammer to crack a nut.
英文:
It is a common misconception to feel no difference between wildcards and variables in expand
function.
Let's start from the expand
function. It is just a Python function that has no special meaning to Snakemake. It takes a string with variables (or a list of strings), and parameters that specify how to replace these variables with values. It returns a list of strings, and then the Snakemake magic starts. Actually this function may be used in any context where a function can be used, not only in input
/output
sections of rules.
Wildcards are more complex and misunderstood concept (and @GiangLe) in his answer provides a good example of WRONG usage. Wildcards make rules a kind of template that can be applied to different files. The meaning of each wildcard starts in the output
section, where you specify what kind of files the rule can produce. If there is a way to replace each wildcard with values in such a way that the output matches something that Snakemake wants to produce, it considers the rule as a candidate. If so, it remembers the values of the wildcards that were used for this match, and since then these values are specified for other sections of the rule, including input
.
Prior to explaining the problem with your code I need to make a remark that you don't need the get_values
function, as it significantly complicates your code, is not idiomatic, and is not needed at all. This code:
def get_values(wildcards):
expand(
"results/{output_name}_{split_index}/{output_name}_{split_index}.bed",
output_name=wildcards.output_name, split_index=wildcards.split_index)
rule merge_file_list:
input:
get_values
output:
"results/{output_name}.txt"
is functionally equivalent to a simplified one:
rule merge_file_list:
input:
"results/{output_name}_{split_index}/{output_name}_{split_index}.bed"
output:
"results/{output_name}.txt"
Now let's consider the simplified version of code and imagine that Snakemake needs to find a way to produce a file "results/my_favorite_filename.txt"
. It discovers that there is a rule merge_file_list
that can produce this file if the wildcard {output_name}
would be substituted with the value "my_favorite_filename"
. Now it knows this value and is trying to specify the input of this rule: "results/my_favorite_filename_{split_index}/my_favorite_filename_{split_index}.bed"
. It successfully replaces the wildcard with the value, and then it discovers that the value for the wildcard {split_index}
is not specified. Note that this value shall come from the output
section.
There is not enough info in your question to offer you a solution to your actual problem. Maybe what you need is to use {output_name}
as a wildcard and {split_index}
as a variable in expand
:
rule merge_file_list:
input:
expand("results/{{output_name}}_{split_index}/{{output_name}}_{split_index}.bed", split_index=["1", "2"])
output:
"results/{output_name}.txt"
Now regarding the wrong usage of wildcards given in the answer from @GiangLe:
rule merge_file_list:
input:
expand("results/{output_name}_{split_index}/{output_name}_{split_index}.bed", output_name = OUTNAME, split_index = INDEXEND)
output:
"results/{output_name}.txt"
NEVER EVER DO LIKE THAT! The wildcard {output_name}
specified in the output
is not used in the input
at all! Instead of the actual value of {output_name}
Snakemake uses hardcoded values from the OUTNAME
variable, and you would be lucky if that value would match what you expect. That could be true if you have just a single path to create a single final output, but even in this case using Snakemake is like using a sledgehammer to crack a nut.
答案2
得分: 0
I think the error is from your get_value function.
INDEXEND=["1","2"]
rule all:
input:
expand("results/{output_name}.txt", output_name = OUTNAME)
rule create_files:
output:
touch("results/{output_name}_{split_index}/{output_name}_{split_index}.bed")
rule merge_file_list:
input:
expand("results/{output_name}_{split_index}/{output_name}_{split_index}.bed", output_name = OUTNAME, split_index = INDEXEND)
output:
"results/{output_name}.txt"
shell:
"""
echo {input} | sed 's/ /\\n/g' > {output}
"""
英文:
I think the error is from your get_value function.
OUTNAME=["Homo_sapiens_ch19"]
INDEXEND=["1","2"]
rule all:
input:
expand("results/{output_name}.txt", output_name = OUTNAME)
rule create_files:
output:
touch("results/{output_name}_{split_index}/{output_name}_{split_index}.bed")
rule merge_file_list:
input:
expand("results/{output_name}_{split_index}/{output_name}_{split_index}.bed", output_name = OUTNAME, split_index = INDEXEND)
output:
"results/{output_name}.txt"
shell:
"""
echo {input} | sed 's/ /\\n/g' > {output}
"""
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论