snakemake目标规则作为Python代码的变量

huangapple go评论63阅读模式
英文:

snakemake target rule as variable for python code

问题

I have a snakemake workflow that does a lot of work before it reaches the definition of the rules (querying files, accessing databases, filtering a data frame). Not all of this work is necessary, depending on the target rule I want to invoke.

How can I know which target rule is selected on the command line? Then I could have the pure-python work done upfront be limited to only what is necessary for that target rule.

Desired example, for which I need to know if the "snakemake.target_rule" variable exists and which variable it actually is:

def do_work_for_target_a():
    lots_of_work()
    return table

def do_work_for_target_b():
    lots_of_work()
    return table

if snakemake.target_rule == 'a':
   table_a = do_work_for_target_a()
elif snakemake.target_rule == 'b':
   table_b = do_work_for_target_b()
else:
   table_a = do_work_for_target_a()
   table_b = do_work_for_target_b()

rule all:
   input:
       "output_a.txt",
       "output_b.txt",

rule a:
   input:
       "output_a.txt",


rule b:
   input:
       "output_b.txt",
英文:

I have a snakemake workflow that does a lot of work before it reaches the definition of the rules (querying files, accessing databases, filtering a data frame). Not all of this work is necessary, depending on the target rule I want to invoke.

How can I know which target rule is selected on the command line? Then I could have the pure-python work done upfront be limited to only what is necessary for that target rule.

Desired example, for which I need to know if the "snakemake.target_rule" variable exists and which variable it actually is:

def do_work_for_target_a():
    lots_of_work()
    return table

def do_work_for_target_b():
    lots_of_work()
    return table

if snakemake.target_rule == 'a':
   table_a = do_work_for_target_a()
elif snakemake.target_rule === 'b':
   table_b = do_work_for_target_b()
else:
   table_a = do_work_for_target_a()
   table_b = do_work_for_target_b()

rule all:
   input:
       "output_a.txt",
       "output_b.txt",

rule a:
   input:
       "output_a.txt",


rule b:
   input:
       "output_b.txt",

答案1

得分: 2

这不是对你问题的直接回答,而是一种实现相同结果的变通方法。主要思路是创建一个临时文件,其唯一目的是运行特定规则的准备工作:

def do_work_for_target_a():
    大量工作()

def do_work_for_target_b():
    大量工作()

规则全部:
   输入:
       "output_a.txt",
       "output_b.txt",

规则 pre_a:
   输出: temp("pre_a.txt")
   运行:
      do_work_for_target_a()

规则 pre_b:
   输出: temp("pre_b.txt")
   运行:
      do_work_for_target_b()

规则 a:
   输入:
       "pre_a.txt",
       "output_a.txt",

规则 b:
   输入:
       "pre_b.txt",
       "output_b.txt",
英文:

This is not a direct answer to your question, but a workaround that achieves the same result. The main idea is to create a temporary file whose only purpose is to run the rule-specific preparations:

def do_work_for_target_a():
    lots_of_work()

def do_work_for_target_b():
    lots_of_work()

rule all:
   input:
       "output_a.txt",
       "output_b.txt",

rule pre_a:
   output: temp("pre_a.txt")
   run:
      do_work_for_target_a()

rule pre_b:
   output: temp("pre_b.txt")
   run:
      do_work_for_target_b()


rule a:
   input:
       "pre_a.txt",
       "output_a.txt",

rule b:
   input:
       "pre_b.txt",
       "output_b.txt",

答案2

得分: 1

A quite crude solution could be to parse the sys.argv list and determine the target rule. I'm not sure how to implement this robustly though.

英文:

> How can I know which target rule is selected on the command line

A quite crude solution could be to parse the sys.argv list and determine the target rule. I'm not sure how to implement this robustly though.

答案3

得分: 1

我不知道从CLI内部访问Snakefile中目标规则名称的预期方式。

作为一种解决方法,我建议在snakemake调用中传递额外的--config值,例如snakemake --config prep_for=<a|b> -- <你的snakemake调用的其余部分>,并修改你的Snakefile以访问该配置值:

def do_work_for_target_a():
    大量工作()
    返回表格

def do_work_for_target_b():
    大量工作()
    返回表格

if config.prep_for == 'a':
   表格_a = do_work_for_target_a()
elif config.prep_for == 'b':
   表格_b = do_work_for_target_b()
else:
   表格_a = do_work_for_target_a()
   表格_b = do_work_for_target_b()

...
英文:

I don't know of any intended way to access the name of the target rule name(s) from the CLI inside the Snakefile either.

As a workaround I suggest passing an additional --config value with the snakemake call, e.g. snakemake --config prep_for=<a|b> -- <remainder of your snakemake call> and make the modifications to your Snakefile to access that config value:

def do_work_for_target_a():
    lots_of_work()
    return table

def do_work_for_target_b():
    lots_of_work()
    return table

if config.prep_for == 'a':
   table_a = do_work_for_target_a()
elif config.prep_for === 'b':
   table_b = do_work_for_target_b()
else:
   table_a = do_work_for_target_a()
   table_b = do_work_for_target_b()

...

huangapple
  • 本文由 发表于 2023年5月11日 16:58:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76225840.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定