snakemake目标规则作为Python代码的变量

huangapple go评论93阅读模式
英文:

snakemake target rule as variable for python code

问题

I have a snakemake workflow that does a lot of work before it reaches the definition of the rules (querying files, accessing databases, filtering a data frame). Not all of this work is necessary, depending on the target rule I want to invoke.

How can I know which target rule is selected on the command line? Then I could have the pure-python work done upfront be limited to only what is necessary for that target rule.

Desired example, for which I need to know if the "snakemake.target_rule" variable exists and which variable it actually is:

  1. def do_work_for_target_a():
  2. lots_of_work()
  3. return table
  4. def do_work_for_target_b():
  5. lots_of_work()
  6. return table
  7. if snakemake.target_rule == 'a':
  8. table_a = do_work_for_target_a()
  9. elif snakemake.target_rule == 'b':
  10. table_b = do_work_for_target_b()
  11. else:
  12. table_a = do_work_for_target_a()
  13. table_b = do_work_for_target_b()
  14. rule all:
  15. input:
  16. "output_a.txt",
  17. "output_b.txt",
  18. rule a:
  19. input:
  20. "output_a.txt",
  21. rule b:
  22. input:
  23. "output_b.txt",
英文:

I have a snakemake workflow that does a lot of work before it reaches the definition of the rules (querying files, accessing databases, filtering a data frame). Not all of this work is necessary, depending on the target rule I want to invoke.

How can I know which target rule is selected on the command line? Then I could have the pure-python work done upfront be limited to only what is necessary for that target rule.

Desired example, for which I need to know if the "snakemake.target_rule" variable exists and which variable it actually is:

  1. def do_work_for_target_a():
  2. lots_of_work()
  3. return table
  4. def do_work_for_target_b():
  5. lots_of_work()
  6. return table
  7. if snakemake.target_rule == 'a':
  8. table_a = do_work_for_target_a()
  9. elif snakemake.target_rule === 'b':
  10. table_b = do_work_for_target_b()
  11. else:
  12. table_a = do_work_for_target_a()
  13. table_b = do_work_for_target_b()
  14. rule all:
  15. input:
  16. "output_a.txt",
  17. "output_b.txt",
  18. rule a:
  19. input:
  20. "output_a.txt",
  21. rule b:
  22. input:
  23. "output_b.txt",

答案1

得分: 2

这不是对你问题的直接回答,而是一种实现相同结果的变通方法。主要思路是创建一个临时文件,其唯一目的是运行特定规则的准备工作:

  1. def do_work_for_target_a():
  2. 大量工作()
  3. def do_work_for_target_b():
  4. 大量工作()
  5. 规则全部:
  6. 输入:
  7. "output_a.txt",
  8. "output_b.txt",
  9. 规则 pre_a:
  10. 输出: temp("pre_a.txt")
  11. 运行:
  12. do_work_for_target_a()
  13. 规则 pre_b:
  14. 输出: temp("pre_b.txt")
  15. 运行:
  16. do_work_for_target_b()
  17. 规则 a:
  18. 输入:
  19. "pre_a.txt",
  20. "output_a.txt",
  21. 规则 b:
  22. 输入:
  23. "pre_b.txt",
  24. "output_b.txt",
英文:

This is not a direct answer to your question, but a workaround that achieves the same result. The main idea is to create a temporary file whose only purpose is to run the rule-specific preparations:

  1. def do_work_for_target_a():
  2. lots_of_work()
  3. def do_work_for_target_b():
  4. lots_of_work()
  5. rule all:
  6. input:
  7. "output_a.txt",
  8. "output_b.txt",
  9. rule pre_a:
  10. output: temp("pre_a.txt")
  11. run:
  12. do_work_for_target_a()
  13. rule pre_b:
  14. output: temp("pre_b.txt")
  15. run:
  16. do_work_for_target_b()
  17. rule a:
  18. input:
  19. "pre_a.txt",
  20. "output_a.txt",
  21. rule b:
  22. input:
  23. "pre_b.txt",
  24. "output_b.txt",

答案2

得分: 1

A quite crude solution could be to parse the sys.argv list and determine the target rule. I'm not sure how to implement this robustly though.

英文:

> How can I know which target rule is selected on the command line

A quite crude solution could be to parse the sys.argv list and determine the target rule. I'm not sure how to implement this robustly though.

答案3

得分: 1

我不知道从CLI内部访问Snakefile中目标规则名称的预期方式。

作为一种解决方法,我建议在snakemake调用中传递额外的--config值,例如snakemake --config prep_for=<a|b> -- <你的snakemake调用的其余部分>,并修改你的Snakefile以访问该配置值:

  1. def do_work_for_target_a():
  2. 大量工作()
  3. 返回表格
  4. def do_work_for_target_b():
  5. 大量工作()
  6. 返回表格
  7. if config.prep_for == 'a':
  8. 表格_a = do_work_for_target_a()
  9. elif config.prep_for == 'b':
  10. 表格_b = do_work_for_target_b()
  11. else:
  12. 表格_a = do_work_for_target_a()
  13. 表格_b = do_work_for_target_b()
  14. ...
英文:

I don't know of any intended way to access the name of the target rule name(s) from the CLI inside the Snakefile either.

As a workaround I suggest passing an additional --config value with the snakemake call, e.g. snakemake --config prep_for=<a|b> -- <remainder of your snakemake call> and make the modifications to your Snakefile to access that config value:

  1. def do_work_for_target_a():
  2. lots_of_work()
  3. return table
  4. def do_work_for_target_b():
  5. lots_of_work()
  6. return table
  7. if config.prep_for == 'a':
  8. table_a = do_work_for_target_a()
  9. elif config.prep_for === 'b':
  10. table_b = do_work_for_target_b()
  11. else:
  12. table_a = do_work_for_target_a()
  13. table_b = do_work_for_target_b()
  14. ...

huangapple
  • 本文由 发表于 2023年5月11日 16:58:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76225840.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定