这个snakemake脚本指令为什么不起作用?

huangapple go评论86阅读模式
英文:

Why doesn't this snakemake script directive work

问题

I'm creating a snakemake workflow. I've never used the run or script directives until yesterday. I had a run directive that worked just fine, but snakemake --lint complained it was too long and said I should create a separate script and use the script directive:

Lints for rule check_peak_read_len_overlap_params (line 90, /Users/rleach/PROJECT-local/ATACC/REPOS/ATACCompendium/workflow/rules/error_checks.smk):
    * Migrate long run directives into scripts or notebooks:
      Long run directives hamper workflow readability. Use the script or notebook directive instead. Note that the script or notebook directive does not involve boilerplate. Similar to run, you will have direct access to params, input, output, and wildcards. Only use the run directive for a
      handful of lines.
      Also see:
      https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#external-scripts
      https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#jupyter-notebook-integration

So I tried turning this:

rule check_peak_read_len_overlap_params:
    input:
        "results/QC/max_read_length.txt",
    output:
        "results/QC/parameter_validation.txt",
    params:
        frac_read_overlap=FRAC_READ_OVERLAP,
        min_peak_width=MAX_ARTIFACT_WIDTH + 1,
        summit_width=SUMMIT_FLANK_SIZE * 2,
    log:
        tail="results/QC/logs/check_peak_read_len_overlap_params.stderr",
    run:
        max_read_len = 0
        # Get the max read length from the input file
        infile = open(input[0])
        for line in infile:
            max_read_len = int(line.strip())
            # The file should only be one line
            break
        infile.close()

        max_peak_frac = params.min_peak_width / max_read_len
        max_summit_frac = params.summit_width / max_read_len

        if max_peak_frac < params.frac_read_overlap:
            raise ValueError(
                f"There exist reads (of length {max_read_len}) that, if "
                "mapped to the smallest allowed peak (of length "
                f"{params.min_peak_width}, based on a MAX_ARTIFACT_WIDTH "
                f"of {MAX_ARTIFACT_WIDTH}), would never be counted using a "
                f"FRAC_READ_OVERLAP of {params.frac_read_overlap}."
            )

        if max_summit_frac < params.frac_read_overlap:
            raise ValueError(
                f"There exist reads (of length {max_read_len}) that, if "
                f"mapped to any summit (of length {params.summit_width}), "
                "would never be counted using a FRAC_READ_OVERLAP of "
                f"{params.frac_read_overlap}."
            )

        with open(output[0], "w") as out:
            out.write("Parameters have validated successfully.")

into this:

rule check_peak_read_len_overlap_params:
    input:
        "results/QC/max_read_length.txt",
    output:
        "results/QC/parameter_validation.txt",
    params:
        frac_read_overlap=FRAC_READ_OVERLAP,
        max_artifact_width=MAX_ARTIFACT_WIDTH,
        summit_flank_size=SUMMIT_FLANK_SIZE,
    log:
        "results/QC/logs/parameter_validation.log",
    conda:
        "../envs/python3.yml"
    script:
        "scripts/check_peak_read_len_overlap_params.py"

(Note, I changed the parameters at the same time so I could manipulate them in the script instead of under the params directive.)

At first, I simply pasted the run code into a function and called that function. Then I tried tweaking the code to get some sort of log output. I tried adding a shebang at the top. I tried importing snakemake (which the docs didn't mention). I tried a bunch of things, but nothing seemed to work. Currently, this is what I have in the script:

import sys

def check_peak_read_len_overlap_params(
    infile,
    outfile,
    log,
    frac_read_overlap,
    max_artifact_width,
    summit_flank_size,
):
    log_handle = open(log, "w")
    sys.stderr = sys.stdout = log_handle

    print(
        "Parameters:\n"
        f"Input: {infile}\n"
        f"Output: {outfile}\n"
        f"Log: {log}\n"
        f"\tFRAC_READ_OVERLAP = {frac_read_overlap}\n"
        f"\tMAX_ARTIFACT_WIDTH = {max_artifact_width}\n"
        f"\tSUMMIT_FLANK_SIZE = {summit_flank_size}"
    )

    min_peak_width = max_artifact_width + 1
    summit_width = summit_flank_size * 2
    max_read_len = 0
    inhandle =

<details>
<summary>英文:</summary>

I&#39;m creating a snakemake workflow.  I&#39;ve never used the run or script directives until yesterday.  I had a run directive that worked just fine, but `snakemake --lint` complained it was too long and said I should create a separate script and use the script directive:

Lints for rule check_peak_read_len_overlap_params (line 90, /Users/rleach/PROJECT-local/ATACC/REPOS/ATACCompendium/workflow/rules/error_checks.smk):
* Migrate long run directives into scripts or notebooks:
Long run directives hamper workflow readability. Use the script or notebook directive instead. Note that the script or notebook directive does not involve boilerplate. Similar to run, you will have direct access to params, input, output, and wildcards.Only use the run directive for a
handful of lines.
Also see:
https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#external-scripts
https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#jupyter-notebook-integration

So I tried turning this:

rule check_peak_read_len_overlap_params:
input:
"results/QC/max_read_length.txt",
output:
"results/QC/parameter_validation.txt",
params:
frac_read_overlap=FRAC_READ_OVERLAP,
min_peak_width=MAX_ARTIFACT_WIDTH + 1,
summit_width=SUMMIT_FLANK_SIZE * 2,
log:
tail="results/QC/logs/check_peak_read_len_overlap_params.stderr",
run:
max_read_len = 0
# Get the max read length from the input file
infile = open(input[0])
for line in infile:
max_read_len = int(line.strip())
# The file should only be one line
break
infile.close()

    max_peak_frac = params.min_peak_width / max_read_len
    max_summit_frac = params.summit_width / max_read_len

    if max_peak_frac &lt; params.frac_read_overlap:
        raise ValueError(
            f&quot;There exist reads (of length {max_read_len}) that, if &quot;
            &quot;mapped to the smallest allowed peak (of length &quot;
            f&quot;{params.min_peak_width}, based on a MAX_ARTIFACT_WIDTH &quot;
            f&quot;of {MAX_ARTIFACT_WIDTH}), would never be counted using a &quot;
            f&quot;FRAC_READ_OVERLAP of {params.frac_read_overlap}.&quot;
        )

    if max_summit_frac &lt; params.frac_read_overlap:
        raise ValueError(
            f&quot;There exist reads (of length {max_read_len}) that, if &quot;
            f&quot;mapped to any summit (of length {params.summit_width}), &quot;
            &quot;would never be counted using a FRAC_READ_OVERLAP of &quot;
            f&quot;{params.frac_read_overlap}.&quot;
        )

    with open(output[0], &quot;w&quot;) as out:
        out.write(&quot;Parameters have validated successfully.&quot;)
into this:

rule check_peak_read_len_overlap_params:
input:
"results/QC/max_read_length.txt",
output:
"results/QC/parameter_validation.txt",
params:
frac_read_overlap=FRAC_READ_OVERLAP,
max_artifact_width=MAX_ARTIFACT_WIDTH,
summit_flank_size=SUMMIT_FLANK_SIZE,
log:
"results/QC/logs/parameter_validation.log",
conda:
"../envs/python3.yml"
script:
"scripts/check_peak_read_len_overlap_params.py"

*(Note, I changed the parameters at the same time so I could manipulate them in the script instead of under the params directive.)*
At first, I simply pasted the `run` code into a function, and called that function.  Then I tried tweaking the code to get some sort of log output.  I tried adding a shebang at the top.  I tried inporting snakemake (which the docs didn&#39;t mention).  I tried a bunch of things, but nothing seemed to work.  Currently, this is what I have in the script:

import sys

def check_peak_read_len_overlap_params(
infile,
outfile,
log,
frac_read_overlap,
max_artifact_width,
summit_flank_size,
):
log_handle = open(log, "w")
sys.stderr = sys.stdout = log_handle

print(
    &quot;Parameters:\n&quot;
    f&quot;Input: {infile}\n&quot;
    f&quot;Output: {outfile}\n&quot;
    f&quot;Log: {log}\n&quot;
    f&quot;\tFRAC_READ_OVERLAP = {frac_read_overlap}\n&quot;
    f&quot;\tMAX_ARTIFACT_WIDTH = {max_artifact_width}\n&quot;
    f&quot;\tSUMMIT_FLANK_SIZE = {summit_flank_size}&quot;
)

min_peak_width = max_artifact_width + 1
summit_width = summit_flank_size * 2
max_read_len = 0
inhandle = open(infile)
for line in inhandle:
    max_read_len = int(line.strip())
    # The file should only be one line
    break
inhandle.close()

max_peak_frac = min_peak_width / max_read_len
max_summit_frac = summit_width / max_read_len

if max_peak_frac &lt; frac_read_overlap:
    raise ValueError(
        f&quot;There exist reads (of length {max_read_len}) that, if &quot;
        &quot;mapped to the smallest allowed peak (of length &quot;
        f&quot;{min_peak_width}, based on a MAX_ARTIFACT_WIDTH &quot;
        f&quot;of {max_artifact_width}), would never be counted using a &quot;
        f&quot;FRAC_READ_OVERLAP of {frac_read_overlap}.&quot;
    )

if max_summit_frac &lt; frac_read_overlap:
    raise ValueError(
        f&quot;There exist reads (of length {max_read_len}) that, if &quot;
        f&quot;mapped to any summit (of length {summit_width}), &quot;
        &quot;would never be counted using a FRAC_READ_OVERLAP of &quot;
        f&quot;{frac_read_overlap}.&quot;
    )

with open(outfile, &quot;w&quot;) as outhandle:
    outhandle.write(&quot;Parameters have validated successfully.&quot;)

check_peak_read_len_overlap_params(
snakemake.input[0],
snakemake.output[0],
snakemake.log[0],
snakemake.params.frac_read_overlap,
snakemake.params.max_artifact_width,
snakemake.params.summit_flank_size,
)

As I was working on this question, I figured out the issue, but I&#39;ll continue to post this and then answer, because nothing in the 
(https://snakemake.readthedocs.io/en/v7.25.0/snakefiles/rules.html#python) or via google searching, that I could find, answers this question... </details> # 答案1 **得分**: 1 当我尝试添加脚本指令时,我直接从[文档](https://snakemake.readthedocs.io/en/v7.25.0/snakefiles/rules.html#python)中复制了以下内容: ```shell script: "scripts/script.py"

...然后进行了编辑。

问题在于文档中的示例不正确。如果你按照他们建议的目录结构进行操作:

├── .gitignore
├── README.md
├── LICENSE.md
├── workflow
│   ├── rules
|   │   ├── module1.smk
|   │   └── module2.smk
│   ├── envs
|   │   ├── tool1.yaml
|   │   └── tool2.yaml
│   ├── scripts
|   │   ├── script1.py
|   │   └── script2.R
│   ├── notebooks
|   │   ├── notebook1.py.ipynb
|   │   └── notebook2.r.ipynb
│   ├── report
|   │   ├── plot1.rst
|   │   └── plot2.rst
|   └── Snakefile
├── config
│   ├── config.yaml
│   └── some-sheet.tsv
├── results
└── resources

脚本的路径应该是:

    script:
        "../scripts/script.py"

一旦我添加了这个路径,它就开始工作了(或者更确切地说,我进入了下一个错误)。

^ 我正在运行的 snakemake 版本是 7.25.0,但它与最新版本相同。

英文:

When I tried adding a script directive, I literally copied this from the docs^:

    script:
        &quot;scripts/script.py&quot;

...then edited it.

The problem was that the docs example is bad. If you follow their suggested directory structure:

├── .gitignore
├── README.md
├── LICENSE.md
├── workflow
│   ├── rules
|   │   ├── module1.smk
|   │   └── module2.smk
│   ├── envs
|   │   ├── tool1.yaml
|   │   └── tool2.yaml
│   ├── scripts
|   │   ├── script1.py
|   │   └── script2.R
│   ├── notebooks
|   │   ├── notebook1.py.ipynb
|   │   └── notebook2.r.ipynb
│   ├── report
|   │   ├── plot1.rst
|   │   └── plot2.rst
|   └── Snakefile
├── config
│   ├── config.yaml
│   └── some-sheet.tsv
├── results
└── resources

The path to the script has to be:

    script:
        &quot;../scripts/script.py&quot;

Once I added that, it started working (or rather, I got to the next error).

^ snakemake Version 7.25.0 is the version I'm running, but it's the same as the latest.

huangapple
  • 本文由 发表于 2023年6月16日 03:50:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76485096.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定