英文:
Why doesn't this snakemake script directive work
问题
I'm creating a snakemake workflow. I've never used the run or script directives until yesterday. I had a run directive that worked just fine, but snakemake --lint
complained it was too long and said I should create a separate script and use the script directive:
Lints for rule check_peak_read_len_overlap_params (line 90, /Users/rleach/PROJECT-local/ATACC/REPOS/ATACCompendium/workflow/rules/error_checks.smk):
* Migrate long run directives into scripts or notebooks:
Long run directives hamper workflow readability. Use the script or notebook directive instead. Note that the script or notebook directive does not involve boilerplate. Similar to run, you will have direct access to params, input, output, and wildcards. Only use the run directive for a
handful of lines.
Also see:
https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#external-scripts
https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#jupyter-notebook-integration
So I tried turning this:
rule check_peak_read_len_overlap_params:
input:
"results/QC/max_read_length.txt",
output:
"results/QC/parameter_validation.txt",
params:
frac_read_overlap=FRAC_READ_OVERLAP,
min_peak_width=MAX_ARTIFACT_WIDTH + 1,
summit_width=SUMMIT_FLANK_SIZE * 2,
log:
tail="results/QC/logs/check_peak_read_len_overlap_params.stderr",
run:
max_read_len = 0
# Get the max read length from the input file
infile = open(input[0])
for line in infile:
max_read_len = int(line.strip())
# The file should only be one line
break
infile.close()
max_peak_frac = params.min_peak_width / max_read_len
max_summit_frac = params.summit_width / max_read_len
if max_peak_frac < params.frac_read_overlap:
raise ValueError(
f"There exist reads (of length {max_read_len}) that, if "
"mapped to the smallest allowed peak (of length "
f"{params.min_peak_width}, based on a MAX_ARTIFACT_WIDTH "
f"of {MAX_ARTIFACT_WIDTH}), would never be counted using a "
f"FRAC_READ_OVERLAP of {params.frac_read_overlap}."
)
if max_summit_frac < params.frac_read_overlap:
raise ValueError(
f"There exist reads (of length {max_read_len}) that, if "
f"mapped to any summit (of length {params.summit_width}), "
"would never be counted using a FRAC_READ_OVERLAP of "
f"{params.frac_read_overlap}."
)
with open(output[0], "w") as out:
out.write("Parameters have validated successfully.")
into this:
rule check_peak_read_len_overlap_params:
input:
"results/QC/max_read_length.txt",
output:
"results/QC/parameter_validation.txt",
params:
frac_read_overlap=FRAC_READ_OVERLAP,
max_artifact_width=MAX_ARTIFACT_WIDTH,
summit_flank_size=SUMMIT_FLANK_SIZE,
log:
"results/QC/logs/parameter_validation.log",
conda:
"../envs/python3.yml"
script:
"scripts/check_peak_read_len_overlap_params.py"
(Note, I changed the parameters at the same time so I could manipulate them in the script instead of under the params directive.)
At first, I simply pasted the run
code into a function and called that function. Then I tried tweaking the code to get some sort of log output. I tried adding a shebang at the top. I tried importing snakemake (which the docs didn't mention). I tried a bunch of things, but nothing seemed to work. Currently, this is what I have in the script:
import sys
def check_peak_read_len_overlap_params(
infile,
outfile,
log,
frac_read_overlap,
max_artifact_width,
summit_flank_size,
):
log_handle = open(log, "w")
sys.stderr = sys.stdout = log_handle
print(
"Parameters:\n"
f"Input: {infile}\n"
f"Output: {outfile}\n"
f"Log: {log}\n"
f"\tFRAC_READ_OVERLAP = {frac_read_overlap}\n"
f"\tMAX_ARTIFACT_WIDTH = {max_artifact_width}\n"
f"\tSUMMIT_FLANK_SIZE = {summit_flank_size}"
)
min_peak_width = max_artifact_width + 1
summit_width = summit_flank_size * 2
max_read_len = 0
inhandle =
<details>
<summary>英文:</summary>
I'm creating a snakemake workflow. I've never used the run or script directives until yesterday. I had a run directive that worked just fine, but `snakemake --lint` complained it was too long and said I should create a separate script and use the script directive:
Lints for rule check_peak_read_len_overlap_params (line 90, /Users/rleach/PROJECT-local/ATACC/REPOS/ATACCompendium/workflow/rules/error_checks.smk):
* Migrate long run directives into scripts or notebooks:
Long run directives hamper workflow readability. Use the script or notebook directive instead. Note that the script or notebook directive does not involve boilerplate. Similar to run, you will have direct access to params, input, output, and wildcards.Only use the run directive for a
handful of lines.
Also see:
https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#external-scripts
https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#jupyter-notebook-integration
So I tried turning this:
rule check_peak_read_len_overlap_params:
input:
"results/QC/max_read_length.txt",
output:
"results/QC/parameter_validation.txt",
params:
frac_read_overlap=FRAC_READ_OVERLAP,
min_peak_width=MAX_ARTIFACT_WIDTH + 1,
summit_width=SUMMIT_FLANK_SIZE * 2,
log:
tail="results/QC/logs/check_peak_read_len_overlap_params.stderr",
run:
max_read_len = 0
# Get the max read length from the input file
infile = open(input[0])
for line in infile:
max_read_len = int(line.strip())
# The file should only be one line
break
infile.close()
max_peak_frac = params.min_peak_width / max_read_len
max_summit_frac = params.summit_width / max_read_len
if max_peak_frac < params.frac_read_overlap:
raise ValueError(
f"There exist reads (of length {max_read_len}) that, if "
"mapped to the smallest allowed peak (of length "
f"{params.min_peak_width}, based on a MAX_ARTIFACT_WIDTH "
f"of {MAX_ARTIFACT_WIDTH}), would never be counted using a "
f"FRAC_READ_OVERLAP of {params.frac_read_overlap}."
)
if max_summit_frac < params.frac_read_overlap:
raise ValueError(
f"There exist reads (of length {max_read_len}) that, if "
f"mapped to any summit (of length {params.summit_width}), "
"would never be counted using a FRAC_READ_OVERLAP of "
f"{params.frac_read_overlap}."
)
with open(output[0], "w") as out:
out.write("Parameters have validated successfully.")
into this:
rule check_peak_read_len_overlap_params:
input:
"results/QC/max_read_length.txt",
output:
"results/QC/parameter_validation.txt",
params:
frac_read_overlap=FRAC_READ_OVERLAP,
max_artifact_width=MAX_ARTIFACT_WIDTH,
summit_flank_size=SUMMIT_FLANK_SIZE,
log:
"results/QC/logs/parameter_validation.log",
conda:
"../envs/python3.yml"
script:
"scripts/check_peak_read_len_overlap_params.py"
*(Note, I changed the parameters at the same time so I could manipulate them in the script instead of under the params directive.)*
At first, I simply pasted the `run` code into a function, and called that function. Then I tried tweaking the code to get some sort of log output. I tried adding a shebang at the top. I tried inporting snakemake (which the docs didn't mention). I tried a bunch of things, but nothing seemed to work. Currently, this is what I have in the script:
import sys
def check_peak_read_len_overlap_params(
infile,
outfile,
log,
frac_read_overlap,
max_artifact_width,
summit_flank_size,
):
log_handle = open(log, "w")
sys.stderr = sys.stdout = log_handle
print(
"Parameters:\n"
f"Input: {infile}\n"
f"Output: {outfile}\n"
f"Log: {log}\n"
f"\tFRAC_READ_OVERLAP = {frac_read_overlap}\n"
f"\tMAX_ARTIFACT_WIDTH = {max_artifact_width}\n"
f"\tSUMMIT_FLANK_SIZE = {summit_flank_size}"
)
min_peak_width = max_artifact_width + 1
summit_width = summit_flank_size * 2
max_read_len = 0
inhandle = open(infile)
for line in inhandle:
max_read_len = int(line.strip())
# The file should only be one line
break
inhandle.close()
max_peak_frac = min_peak_width / max_read_len
max_summit_frac = summit_width / max_read_len
if max_peak_frac < frac_read_overlap:
raise ValueError(
f"There exist reads (of length {max_read_len}) that, if "
"mapped to the smallest allowed peak (of length "
f"{min_peak_width}, based on a MAX_ARTIFACT_WIDTH "
f"of {max_artifact_width}), would never be counted using a "
f"FRAC_READ_OVERLAP of {frac_read_overlap}."
)
if max_summit_frac < frac_read_overlap:
raise ValueError(
f"There exist reads (of length {max_read_len}) that, if "
f"mapped to any summit (of length {summit_width}), "
"would never be counted using a FRAC_READ_OVERLAP of "
f"{frac_read_overlap}."
)
with open(outfile, "w") as outhandle:
outhandle.write("Parameters have validated successfully.")
check_peak_read_len_overlap_params(
snakemake.input[0],
snakemake.output[0],
snakemake.log[0],
snakemake.params.frac_read_overlap,
snakemake.params.max_artifact_width,
snakemake.params.summit_flank_size,
)
As I was working on this question, I figured out the issue, but I'll continue to post this and then answer, because nothing in the (https://snakemake.readthedocs.io/en/v7.25.0/snakefiles/rules.html#python) or via google searching, that I could find, answers this question...
</details>
# 答案1
**得分**: 1
当我尝试添加脚本指令时,我直接从[文档](https://snakemake.readthedocs.io/en/v7.25.0/snakefiles/rules.html#python)中复制了以下内容:
```shell
script:
"scripts/script.py"
...然后进行了编辑。
问题在于文档中的示例不正确。如果你按照他们建议的目录结构进行操作:
├── .gitignore
├── README.md
├── LICENSE.md
├── workflow
│ ├── rules
| │ ├── module1.smk
| │ └── module2.smk
│ ├── envs
| │ ├── tool1.yaml
| │ └── tool2.yaml
│ ├── scripts
| │ ├── script1.py
| │ └── script2.R
│ ├── notebooks
| │ ├── notebook1.py.ipynb
| │ └── notebook2.r.ipynb
│ ├── report
| │ ├── plot1.rst
| │ └── plot2.rst
| └── Snakefile
├── config
│ ├── config.yaml
│ └── some-sheet.tsv
├── results
└── resources
脚本的路径应该是:
script:
"../scripts/script.py"
一旦我添加了这个路径,它就开始工作了(或者更确切地说,我进入了下一个错误)。
^ 我正在运行的 snakemake 版本是 7.25.0
,但它与最新版本相同。
英文:
When I tried adding a script directive, I literally copied this from the docs^:
script:
"scripts/script.py"
...then edited it.
The problem was that the docs example is bad. If you follow their suggested directory structure:
├── .gitignore
├── README.md
├── LICENSE.md
├── workflow
│ ├── rules
| │ ├── module1.smk
| │ └── module2.smk
│ ├── envs
| │ ├── tool1.yaml
| │ └── tool2.yaml
│ ├── scripts
| │ ├── script1.py
| │ └── script2.R
│ ├── notebooks
| │ ├── notebook1.py.ipynb
| │ └── notebook2.r.ipynb
│ ├── report
| │ ├── plot1.rst
| │ └── plot2.rst
| └── Snakefile
├── config
│ ├── config.yaml
│ └── some-sheet.tsv
├── results
└── resources
The path to the script has to be:
script:
"../scripts/script.py"
Once I added that, it started working (or rather, I got to the next error).
^ snakemake Version 7.25.0
is the version I'm running, but it's the same as the latest.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论