英文:
Snakemake changes wildcard, resulting in InputFunctionException
问题
-
错误1:
错误信息表明 wildcard 使用了
prrsv12_qcpass
,但是 wildcard 应该是prrsv12
。而且prrsv12_qcpass
是apply_qc
规则的输出文件名。 -
错误2:
错误信息中指出
typing
的 wildcard 错误,但apply_qc
的 wildcard 正确。此外,typing
预期的输入应该是output/consensus/prrsv12.fasta
而不是output/consensus/prrsv12_qcpass.fasta
。
你尝试过使用 rules.<rule>.output
语法和添加 ruleorder
来解决 AmbiguousRuleException
错误,但仍然遇到问题。对于第一个错误,似乎是 wildcard 的问题,但你不清楚为什么会在 wildcard 中添加 _qcpass
。有时候这个错误会随机出现。
在运行时使用 --debug-dag
时,看到 _qcpass
被添加到了 wildcard 中,但对于 apply_qc
规则似乎没有问题,尽管它是管道中的最后一个规则。
请问你还需要关于这些错误的进一步帮助吗?
英文:
I keep getting the same errors at the same step in the pipeline. I have 2 rules named typing
and apply_qc
which somehow conflict. typing
uses outputs from another rule, polish_consensus
, and apply_qc
uses the outputs of typing
(so the order: polish_consensus > typing > apply_qc
). The outputs of typing
are a fasta and CSV file. apply_qc
is a quality control step, which will censor the data of these files when of low quality. Now I keep getting the same errors with the rules:
The code:
rule typing:
input:
f"{DATA_FOLDER}/vaccines.fasta",
rules.polish_consensus.output
output:
temp(f"{OUTPUT_FOLDER}/typing/{{samplename}}.csv")
script:
"../scripts/typing.py"
rule apply_qc:
input:
rules.typing.output,
rules.polish_consensus.output,
rules.featurecounts.output.summary
output:
typing=f"{OUTPUT_FOLDER}/typing/{{samplename}}_qcpass.csv",
consensus=f"{OUTPUT_FOLDER}/consensus/{{samplename}}_qcpass.fasta"
script:
"../scripts/apply_qc.py"
The output of the rule polish_consensus
is output/consensus/{samplename}.fasta
with samplename=prrsv12
.
The error:
InputFunctionException in rule typing in file /home/lisah/Pycharm/minor-HTHPC/snakemake/workflow/rules/typing.smk, line 1:
Error:
KeyError: 'prrsv12_qcpass'
Wildcards:
samplename=prrsv12_qcpass
Traceback:
File "/home/lisah/Pycharm/minor-HTHPC/snakemake/workflow/rules/typing.smk", line 12, in <lambda>
The error shows that the wildcard used is prrsv12_qcpass
, but the wildcard is prrsv12
+ prrsv12_qcpass
is the filename of an output of the apply_qc
rule.
- The second error is something I hope I already fixed, but it shows more info than the previous error:
AmbiguousRuleException:
Rules apply_qc and typing are ambiguous for the file output/typing/prrsv12_qcpass.csv.
Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
Wildcards:
apply_qc: samplename=prrsv12
typing: samplename=prrsv12_qcpass
Expected input files:
apply_qc: output/typing/prrsv12.csv output/consensus/prrsv12.fasta output/counts/prrsv12_summary.csv
typing: data/prrsv/vaccines.fasta output/consensus/prrsv12_qcpass.fasta
Expected output files:
apply_qc: output/typing/prrsv12_qcpass.csv output/consensus/prrsv12_qcpass.fasta
typing: output/typing/prrsv12_qcpass.csv
As said before, the wildcard for typing
is wrong, but the wildcard for apply_qc
is correct (?????). Likewise, the expected input for typing
is not output/consensus/prrsv12_qcpass.fasta
but output/consensus/prrsv12.fasta
.
I hoped I fixed the AmbiguousRuleException
by using the rules.<rule>.output
syntax and adding a ruleorder. As for the first error, I am completely lost and have no idea why this happens. It seems like an error with the wildcard, but I have no idea how the _qcpass
part is added to the wildcard. It also seems like this error happens at random: Some runs work fine and others it crashes into this (Yes, run with the same data).
EDIT:
I tried running it with the --debug-dag
and the only thing that popped out is the following:
selected job readcap
wildcards: samplename=prrsv20_qcpass
file output/fastq/prrsv20_qcpass_readcap.fastq.gz:
Producer found, hence exceptions are ignored.
candidate job select_centroid
wildcards: samplename=prrsv20_qcpass
candidate job featurecounts
wildcards: samplename=prrsv20_qcpass
candidate job map2ref
wildcards: samplename=prrsv20_qcpass
candidate job apply_qc
wildcards: samplename=prrsv20
selected job apply_qc
wildcards: samplename=prrsv20
The _qcpass
is added to the wildcard for the rest of the pipeline, but seems to work fine for apply_qc
? apply_qc
is one of the last rules in the pipeline...
答案1
得分: 0
以下是您提供的代码的翻译:
听起来你误解了`snakemake`中模糊规则的含义,以及为什么应该避免它们以及为什么`ruleorder`不能解决你的问题。
首先,这是一个MWE - 一个可以重现你的问题的最小工作示例。请注意,如果你提供这样一个示例和用于运行`snakemake`的调用,对于每个人来说都会更容易。
在这种情况下,可以通过调用`snakemake -call`来重现问题:
```python
rule polish_consensus:
output:
"consensus/{samplename}.fasta",
shell:
"""
echo polish_consensus > {output[0]}
"""
rule typing:
input:
rules.polish_consensus.output,
output:
"typing/{samplename}.csv",
shell:
"""
cat {input[0]} > {output[0]}
"""
rule apply_qc:
input:
rules.typing.output,
rules.polish_consensus.output,
output:
typing="typing/{samplename}_qcpass.csv",
consensus="onsensus/{samplename}_qcpass.fasta",
shell:
"""
echo qcpass > {output[0]}
echo qcpass > {output[1]}
"""
rule all:
default_target: True
input:
expand(rules.apply_qc.output[0], samplename="prrsv12"),
expand(rules.apply_qc.output[1], samplename="prrsv12"),
你的通配符{samplename}
将与你请求的所有输出文件以及snakemake
运行工作流所必须生成的文件匹配。
现在请求typing/prrsv12_qcpass.csv
将匹配具有samplename=prrsv12
的rule apply_qc
的输出,以及具有samplename=prrsv12_qcpass
的rule typing
的输出。为了防止这种情况发生,你应该限制你的通配符,而不是尝试使用ruleorder
或使用对rules.<name>.output
的引用。
通过使用wildcard_constraint
,你告诉snakemake
通配符可以匹配哪些字符串。在你的情况下,你的samplename
可能永远不会包含下划线,也就是说你可以使用:
wildcard_constraints:
samplename="[a-zA-Z0-9]+",
告诉snakemake
匹配小写/大写字母和数字0-9,但不包括任何空格或下划线之类的其他符号。这将使snakemake
永远不会将prrsv12_qcpass
考虑为samplefile
的通配符值,而只会将prsv12
作为通配符值,将_qcpass
作为文件名的附加固定部分。
有关wildcard_constraints
的更多信息可以在文档中找到。
将所有内容放在单个Snakefile
中:
wildcard_constraints:
samplename="[a-zA-Z0-9]+",
rule polish_consensus:
output:
"consensus/{samplename}.fasta",
shell:
"""
echo polish_consensus > {output[0]}
"""
rule typing:
input:
rules.polish_consensus.output,
output:
"typing/{samplename}.csv",
shell:
"""
cat {input[0]} > {output[0]}
"""
rule apply_qc:
input:
rules.typing.output,
rules.polish_consensus.output,
output:
typing="typing/{samplename}_qcpass.csv",
consensus="onsensus/{samplename}_qcpass.fasta",
shell:
"""
echo qcpass > {output[0]}
echo qcpass > {output[1]}
"""
rule all:
default_target: True
input:
expand(rules.apply_qc.output[0], samplename="prrsv12"),
expand(rules.apply_qc.output[1], samplename="prrsv12"),
希望这对你有所帮助。如果你有任何其他问题,请随时问。
<details>
<summary>英文:</summary>
It sounds like you misunderstood what ambiguous rules mean for `snakemake`, why you should avoid them and why `ruleorder` will not solve your problem.
First of all, here's a MWE - a minimal working example which reproduces your issue. Note that it is easier for everyone if you provide such an example and the call used to run `snakemake`.
In this case, the problem can be reproduced by calling `snakemake -call`:
```python
rule polish_consensus:
output:
"consensus/{samplename}.fasta",
shell:
"""
echo polish_consensus > {output[0]}
"""
rule typing:
input:
rules.polish_consensus.output,
output:
"typing/{samplename}.csv",
shell:
"""
cat {input[0]} > {output[0]}
"""
rule apply_qc:
input:
rules.typing.output,
rules.polish_consensus.output,
output:
typing="typing/{samplename}_qcpass.csv",
consensus="onsensus/{samplename}_qcpass.fasta",
shell:
"""
echo qcpass > {output[0]}
echo qcpass > {output[1]}
"""
rule all:
default_target: True
input:
expand(rules.apply_qc.output[0], samplename="prrsv12"),
expand(rules.apply_qc.output[1], samplename="prrsv12"),
Your wildcard {samplename}
will be matched by snakemake
against all the output-files you request as well as files snakemake
has to generate to run the workflow.
Now requesting typing/prrsv12_qcpass.csv
matches the output of rule apply_qc
with samplename=prrsv12
as well as rule typing
with samplename=prrsv12_qcpass
. To prevent this you should constrain your wildcard rather than trying a ruleorder
or using references to a rules.<name>.output
.
By using a wildcard_constraint
you tell snakemake
which strings a wildcard can match. In your case, your samplename
is presumably never going to contain an underscore, i.e. you can use:
wildcard_constraints:
samplename="[a-zA-Z0-9]+",
to tell snakemake
to match against small/capital letters an numbers from 0-9, but not any whitespace or other symbols like underscore. This will make snakemake
never consider prrsv12_qcpass
as the wildcard value for samplefile
, but only prsv12
as the wildcard and _qcpass
as an additional, fixed part of the filename.
More on wildcard_constraints
can be found in the documentation
Putting everything together into a single Snakefile
:
wildcard_constraints:
samplename="[a-zA-Z0-9]+",
rule polish_consensus:
output:
"consensus/{samplename}.fasta",
shell:
"""
echo polish_consensus > {output[0]}
"""
rule typing:
input:
rules.polish_consensus.output,
output:
"typing/{samplename}.csv",
shell:
"""
cat {input[0]} > {output[0]}
"""
rule apply_qc:
input:
rules.typing.output,
rules.polish_consensus.output,
output:
typing="typing/{samplename}_qcpass.csv",
consensus="onsensus/{samplename}_qcpass.fasta",
shell:
"""
echo qcpass > {output[0]}
echo qcpass > {output[1]}
"""
rule all:
default_target: True
input:
expand(rules.apply_qc.output[0], samplename="prrsv12"),
expand(rules.apply_qc.output[1], samplename="prrsv12"),
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论