2023年5月17日 21:49:23go评论51阅读模式

英文:

Snakemake changes wildcard, resulting in InputFunctionException

问题

错误1：

错误信息表明 wildcard 使用了 prrsv12_qcpass，但是 wildcard 应该是 prrsv12。而且 prrsv12_qcpass 是 apply_qc 规则的输出文件名。
错误2：

错误信息中指出 typing 的 wildcard 错误，但 apply_qc 的 wildcard 正确。此外，typing 预期的输入应该是 output/consensus/prrsv12.fasta 而不是 output/consensus/prrsv12_qcpass.fasta。

你尝试过使用 rules.<rule>.output 语法和添加 ruleorder 来解决 AmbiguousRuleException 错误，但仍然遇到问题。对于第一个错误，似乎是 wildcard 的问题，但你不清楚为什么会在 wildcard 中添加 _qcpass。有时候这个错误会随机出现。

在运行时使用 --debug-dag 时，看到 _qcpass 被添加到了 wildcard 中，但对于 apply_qc 规则似乎没有问题，尽管它是管道中的最后一个规则。

请问你还需要关于这些错误的进一步帮助吗？

英文:

I keep getting the same errors at the same step in the pipeline. I have 2 rules named typing and apply_qc which somehow conflict. typing uses outputs from another rule, polish_consensus, and apply_qc uses the outputs of typing (so the order: polish_consensus > typing > apply_qc). The outputs of typing are a fasta and CSV file. apply_qc is a quality control step, which will censor the data of these files when of low quality. Now I keep getting the same errors with the rules:

The code:

rule typing:
    input:
        f&quot;{DATA_FOLDER}/vaccines.fasta&quot;,
        rules.polish_consensus.output
    output:
        temp(f&quot;{OUTPUT_FOLDER}/typing/{{samplename}}.csv&quot;)
    script:
        &quot;../scripts/typing.py&quot;

rule apply_qc:
    input:
        rules.typing.output,
        rules.polish_consensus.output,
        rules.featurecounts.output.summary
    output:
        typing=f&quot;{OUTPUT_FOLDER}/typing/{{samplename}}_qcpass.csv&quot;,
        consensus=f&quot;{OUTPUT_FOLDER}/consensus/{{samplename}}_qcpass.fasta&quot;
    script:
        &quot;../scripts/apply_qc.py&quot;

The output of the rule polish_consensus is output/consensus/{samplename}.fasta with samplename=prrsv12.

The error:

InputFunctionException in rule typing in file /home/lisah/Pycharm/minor-HTHPC/snakemake/workflow/rules/typing.smk, line 1:
Error:
  KeyError: &#39;prrsv12_qcpass&#39;
Wildcards:
  samplename=prrsv12_qcpass
Traceback:
  File &quot;/home/lisah/Pycharm/minor-HTHPC/snakemake/workflow/rules/typing.smk&quot;, line 12, in &lt;lambda&gt;

The error shows that the wildcard used is prrsv12_qcpass, but the wildcard is prrsv12 + prrsv12_qcpass is the filename of an output of the apply_qc rule.

The second error is something I hope I already fixed, but it shows more info than the previous error:

AmbiguousRuleException:
Rules apply_qc and typing are ambiguous for the file output/typing/prrsv12_qcpass.csv.
Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
Wildcards:
        apply_qc: samplename=prrsv12
        typing: samplename=prrsv12_qcpass
Expected input files:
        apply_qc: output/typing/prrsv12.csv output/consensus/prrsv12.fasta output/counts/prrsv12_summary.csv
        typing: data/prrsv/vaccines.fasta output/consensus/prrsv12_qcpass.fasta
Expected output files:
        apply_qc: output/typing/prrsv12_qcpass.csv output/consensus/prrsv12_qcpass.fasta
        typing: output/typing/prrsv12_qcpass.csv

As said before, the wildcard for typing is wrong, but the wildcard for apply_qc is correct (?????). Likewise, the expected input for typing is not output/consensus/prrsv12_qcpass.fasta but output/consensus/prrsv12.fasta.

I hoped I fixed the AmbiguousRuleException by using the rules.<rule>.output syntax and adding a ruleorder. As for the first error, I am completely lost and have no idea why this happens. It seems like an error with the wildcard, but I have no idea how the _qcpass part is added to the wildcard. It also seems like this error happens at random: Some runs work fine and others it crashes into this (Yes, run with the same data).

EDIT:

I tried running it with the --debug-dag and the only thing that popped out is the following:

selected job readcap
    wildcards: samplename=prrsv20_qcpass
file output/fastq/prrsv20_qcpass_readcap.fastq.gz:
    Producer found, hence exceptions are ignored.

candidate job select_centroid
    wildcards: samplename=prrsv20_qcpass
candidate job featurecounts
    wildcards: samplename=prrsv20_qcpass
candidate job map2ref
    wildcards: samplename=prrsv20_qcpass
candidate job apply_qc
    wildcards: samplename=prrsv20
selected job apply_qc
    wildcards: samplename=prrsv20

The _qcpass is added to the wildcard for the rest of the pipeline, but seems to work fine for apply_qc? apply_qc is one of the last rules in the pipeline...

答案1

得分: 0

以下是您提供的代码的翻译：

听起来你误解了`snakemake`中模糊规则的含义，以及为什么应该避免它们以及为什么`ruleorder`不能解决你的问题。

首先，这是一个MWE - 一个可以重现你的问题的最小工作示例。请注意，如果你提供这样一个示例和用于运行`snakemake`的调用，对于每个人来说都会更容易。

在这种情况下，可以通过调用`snakemake -call`来重现问题：

```python
rule polish_consensus:
    output:
        "consensus/{samplename}.fasta",
    shell:
        """
        echo polish_consensus > {output[0]}
        """


rule typing:
    input:
        rules.polish_consensus.output,
    output:
        "typing/{samplename}.csv",
    shell:
        """
        cat {input[0]} > {output[0]}
        """


rule apply_qc:
    input:
        rules.typing.output,
        rules.polish_consensus.output,
    output:
        typing="typing/{samplename}_qcpass.csv",
        consensus="onsensus/{samplename}_qcpass.fasta",
    shell:
        """
        echo qcpass > {output[0]}
        echo qcpass > {output[1]}
        """


rule all:
    default_target: True
    input:
        expand(rules.apply_qc.output[0], samplename="prrsv12"),
        expand(rules.apply_qc.output[1], samplename="prrsv12"),

你的通配符{samplename}将与你请求的所有输出文件以及snakemake运行工作流所必须生成的文件匹配。

现在请求typing/prrsv12_qcpass.csv将匹配具有samplename=prrsv12的rule apply_qc的输出，以及具有samplename=prrsv12_qcpass的rule typing的输出。为了防止这种情况发生，你应该限制你的通配符，而不是尝试使用ruleorder或使用对rules.<name>.output的引用。

通过使用wildcard_constraint，你告诉snakemake通配符可以匹配哪些字符串。在你的情况下，你的samplename可能永远不会包含下划线，也就是说你可以使用：

wildcard_constraints:
    samplename="[a-zA-Z0-9]+",

告诉snakemake匹配小写/大写字母和数字0-9，但不包括任何空格或下划线之类的其他符号。这将使snakemake永远不会将prrsv12_qcpass考虑为samplefile的通配符值，而只会将prsv12作为通配符值，将_qcpass作为文件名的附加固定部分。

有关wildcard_constraints的更多信息可以在文档中找到。

将所有内容放在单个Snakefile中：

wildcard_constraints:
    samplename="[a-zA-Z0-9]+",

rule polish_consensus:
    output:
        "consensus/{samplename}.fasta",
    shell:
        """
        echo polish_consensus > {output[0]}
        """


rule typing:
    input:
        rules.polish_consensus.output,
    output:
        "typing/{samplename}.csv",
    shell:
        """
        cat {input[0]} > {output[0]}
        """


rule apply_qc:
    input:
        rules.typing.output,
        rules.polish_consensus.output,
    output:
        typing="typing/{samplename}_qcpass.csv",
        consensus="onsensus/{samplename}_qcpass.fasta",
    shell:
        """
        echo qcpass > {output[0]}
        echo qcpass > {output[1]}
        """


rule all:
    default_target: True
    input:
        expand(rules.apply_qc.output[0], samplename="prrsv12"),
        expand(rules.apply_qc.output[1], samplename="prrsv12"),


希望这对你有所帮助。如果你有任何其他问题，请随时问。

<details>
<summary>英文:</summary>

It sounds like you misunderstood what ambiguous rules mean for `snakemake`, why you should avoid them and why `ruleorder` will not solve your problem.

First of all, here&#39;s a MWE - a minimal working example which reproduces your issue. Note that it is easier for everyone if you provide such an example and the call used to run `snakemake`.

In this case, the problem can be reproduced by calling `snakemake -call`:

```python
rule polish_consensus:
    output:
        &quot;consensus/{samplename}.fasta&quot;,
    shell:
        &quot;&quot;&quot;
        echo polish_consensus &gt; {output[0]}
        &quot;&quot;&quot;


rule typing:
    input:
        rules.polish_consensus.output,
    output:
        &quot;typing/{samplename}.csv&quot;,
    shell:
        &quot;&quot;&quot;
        cat {input[0]} &gt; {output[0]}
        &quot;&quot;&quot;


rule apply_qc:
    input:
        rules.typing.output,
        rules.polish_consensus.output,
    output:
        typing=&quot;typing/{samplename}_qcpass.csv&quot;,
        consensus=&quot;onsensus/{samplename}_qcpass.fasta&quot;,
    shell:
        &quot;&quot;&quot;
        echo qcpass &gt; {output[0]}
        echo qcpass &gt; {output[1]}
        &quot;&quot;&quot;


rule all:
    default_target: True
    input:
        expand(rules.apply_qc.output[0], samplename=&quot;prrsv12&quot;),
        expand(rules.apply_qc.output[1], samplename=&quot;prrsv12&quot;),

Your wildcard {samplename} will be matched by snakemake against all the output-files you request as well as files snakemake has to generate to run the workflow.

Now requesting typing/prrsv12_qcpass.csv matches the output of rule apply_qc with samplename=prrsv12 as well as rule typing with samplename=prrsv12_qcpass. To prevent this you should constrain your wildcard rather than trying a ruleorder or using references to a rules.<name>.output.

By using a wildcard_constraint you tell snakemake which strings a wildcard can match. In your case, your samplename is presumably never going to contain an underscore, i.e. you can use:

wildcard_constraints:
    samplename=&quot;[a-zA-Z0-9]+&quot;,

to tell snakemake to match against small/capital letters an numbers from 0-9, but not any whitespace or other symbols like underscore. This will make snakemake never consider prrsv12_qcpass as the wildcard value for samplefile, but only prsv12 as the wildcard and _qcpass as an additional, fixed part of the filename.

More on wildcard_constraints can be found in the documentation

Putting everything together into a single Snakefile:

wildcard_constraints:
    samplename=&quot;[a-zA-Z0-9]+&quot;,

rule polish_consensus:
    output:
        &quot;consensus/{samplename}.fasta&quot;,
    shell:
        &quot;&quot;&quot;
        echo polish_consensus &gt; {output[0]}
        &quot;&quot;&quot;


rule typing:
    input:
        rules.polish_consensus.output,
    output:
        &quot;typing/{samplename}.csv&quot;,
    shell:
        &quot;&quot;&quot;
        cat {input[0]} &gt; {output[0]}
        &quot;&quot;&quot;


rule apply_qc:
    input:
        rules.typing.output,
        rules.polish_consensus.output,
    output:
        typing=&quot;typing/{samplename}_qcpass.csv&quot;,
        consensus=&quot;onsensus/{samplename}_qcpass.fasta&quot;,
    shell:
        &quot;&quot;&quot;
        echo qcpass &gt; {output[0]}
        echo qcpass &gt; {output[1]}
        &quot;&quot;&quot;


rule all:
    default_target: True
    input:
        expand(rules.apply_qc.output[0], samplename=&quot;prrsv12&quot;),
        expand(rules.apply_qc.output[1], samplename=&quot;prrsv12&quot;),

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Snakemake更改通配符，导致InputFunctionException。

问题

答案1

如何处理在snakemake中共享某些输入数据的工作流？

在Snakemake中同时使用dict和expand时会丢失某些值。

Snakemake包装在SLURM集群计算节点上没有互联网时无法工作。

执行一个规则，通配符是通过规则中的参数获取的。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论