snakemake 内置 md5sum 函数

huangapple go评论65阅读模式
英文:

snakemake built-in md5sum function

问题

在标题"确保输出文件属性,如非空或校验和符合"下,snakemake教程规则部分中指出:

可以为输出文件的特定附加标准添加注释,以确保在成功生成后满足这些标准。例如,可以用于检查输出文件是否非空,或将其与给定的sha256校验和进行比较。如果使用此功能,Snakemake将在考虑作业成功之前检查这些带注释的文件。可以如下检查非空性:

> 规则名称:
>     输出:
>         ensure("test.txt", non_empty=True)
>     shell:
>         "somecommand {output}"

以上,输出文件test.txt被标记为非空。如果命令somecommand生成了空输出,作业将失败,并显示意外空文件的错误。

可以如下比较sha256校验和:

my_checksum = "u98a9cjsd98saud090923ßkpoasköf9ß32"

> 规则名称:
>     输出:
>         ensure("test.txt", sha256=my_checksum)
>     shell:
>         "somecommand {output}"

是否可以将sha256更改为md5sum?原因是我经常需要下载附带md5sums而不是sha256的数据。

[SnakeMake官方文档][1]

 [1]: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#ensuring-output-file-properties-like-non-emptyness-or-checksum-compliance


<details>
<summary>英文:</summary>

Under the header &quot;Ensuring output file properties like non-emptyness or checksum compliance&quot; of the `snakemake` [tutorial rule section][1] the following is stated:

It is possible to annotate certain additional criteria for output files to be ensured after they have been generated successfully. For example, this can be used to check for output files to be non-empty, or to compare them against a given sha256 checksum. If this functionality is used, Snakemake will check such annotated files before considering a job to be successful. Non-emptyness can be checked as follows:

> rule NAME:
> output:
> ensure("test.txt", non_empty=True)
> shell:
> "somecommand {output}"

Above, the output file test.txt is marked as non-empty. If the command somecommand happens to generate an empty output, the job will fail with an error listing the unexpected empty file.

A sha256 checksum can be compared as follows:

my_checksum = "u98a9cjsd98saud090923ßkpoasköf9ß32"

> rule NAME:
> output:
> ensure("test.txt", sha256=my_checksum)
> shell:
> "somecommand {output}"

Is it possible to change sha256 to md5sum? The reason is that I frequently need to download data that comes with md5sums instead of sha256.

SnakeMake Official Documentation

答案1

得分: 1

目前没有内置支持替代校验函数的功能,请查看GitHub上的此功能建议

英文:

For now there is no built-in support for alternative checksum functions, see this feature proposal on GitHub.

答案2

得分: 0

A dedicated Snakemake function is necessary. 我需要一个专用的Snakemake函数。

英文:

Is a dedicated Snakemake function necessary? I may be missing something here but this is what I would do:

rule one:
    output: ...
    params:
        expected_md5=&#39;foobarspam&#39;
    shell:
        r&quot;&quot;&quot;
        curl ... &gt; {output}
        md5sum {output} | cut -f 1 -d &#39; &#39; | grep -w &#39;{params.expected_md5}&#39;
        &quot;&quot;&quot;

Not tested, but it should work because grep returns non-zero exit code if there is no match. One could achieve the same using awk if you want more control.

huangapple
  • 本文由 发表于 2023年6月6日 02:58:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76409272.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定