Snakemake包装在SLURM集群计算节点上没有互联网时无法工作。

huangapple go评论62阅读模式
英文:

Snakemake wrappers not working on SLURM cluster compute nodes without internet

问题

I am trying to use wrappers in my pipeline on a SLURM cluster, where the compute nodes do not have internet access.

我正在尝试在SLURM集群上使用包装器,其中计算节点没有互联网访问权限。

I have run the pipeline with --conda-create-envs-only first, and then changed the wrapper: directives to point to local folders containing environment.yaml files.
我首先使用--conda-create-envs-only运行了流程,然后更改了wrapper:指令,使其指向包含environment.yaml文件的本地文件夹。

Jobs fail without specific error. Test rule with the same configuration but without wrappers works. Rules with wrappers work fine on login node if I switch back the wrapper: directive to look online.
作业失败,没有明确的错误。具有相同配置但没有包装器的测试规则可以正常工作。如果我将wrapper:指令切换回在线查找,那么带有包装器的规则在登录节点上也能正常工作。

I am running:

snakemake --profile myprofile --cores 40 --use-conda

我正在运行:

snakemake --profile myprofile --cores 40 --use-conda

Example rule:

# Run FastQC on the fastq.gz files
rule fastqc_fastq_gz:
    input:
        input_dir + "{sample}_{read}_001.fastq.gz",
    output:
        html = output_dir + "fastqc/{sample}_{read}_fastqc.html",
        zip = output_dir + "fastqc/{sample}_{read}_fastqc.zip",
    params: 
        extra = "--quiet",
    log:
        output_dir + "logs/fastqc/{sample}_{read}.log",
    threads: 1,
    resources:
        mem_mb = 1024,   
    wrapper:
        # "file:///path/envs/v1.31.0/bio/fastqc/"   # <- Fails on both login and compute nodes
        "v1.31.0/bio/fastqc"                        # <- Works on login node to download env

示例规则:

# 在fastq.gz文件上运行FastQC
rule fastqc_fastq_gz:
    input:
        input_dir + "{sample}_{read}_001.fastq.gz",
    output:
        html = output_dir + "fastqc/{sample}_{read}_fastqc.html",
        zip = output_dir + "fastqc/{sample}_{read}_fastqc.zip",
    params: 
        extra = "--quiet",
    log:
        output_dir + "logs/fastqc/{sample}_{read}.log",
    threads: 1,
    resources:
        mem_mb = 1024,   
    wrapper:
        # "file:///path/envs/v1.31.0/bio/fastqc/"   # <- Fails on both login and compute nodes
        "v1.31.0/bio/fastqc"                        # <- Works on login node to download env

I have also tried using more resources, same behavior.

我还尝试使用更多资源,但行为相同。

My profile:

cluster:
  mkdir -p logs/{rule} &&
  sbatch
    --partition={resources.partition}
    --qos={resources.qos}
    --cpus-per-task={threads}
    --mem={resources.mem_mb}
    --job-name=smk-{rule}-{wildcards}
    --output=logs/{rule}/{rule}-{wildcards}-%j.out
    --error=logs/{rule}/{rule}-{wildcards}-.%j.err
    --account=<account>
    --time={resources.time}
    --parsable
default-resources:
  - partition=<partition>
  - qos=sbatch
  - mem_mb="490G"
  - tmpdir="/path/to/temp/"
  - time="0-10:00:00"
max-jobs-per-second: 10
max-status-checks-per-second: 1
latency-wait: 60
jobs: 16
keep-going: True
rerun-incomplete: True
printshellcmds: True
scheduler: greedy
use-conda: True
cluster-status: status-sacct.sh

我的配置文件:

cluster:
  mkdir -p logs/{rule} &&
  sbatch
    --partition={resources.partition}
    --qos={resources.qos}
    --cpus-per-task={threads}
    --mem={resources.mem_mb}
    --job-name=smk-{rule}-{wildcards}
    --output=logs/{rule}/{rule}-{wildcards}-%j.out
    --error=logs/{rule}/{rule}-{wildcards}-.%j.err
    --account=<account>
    --time={resources.time}
    --parsable
default-resources:
  - partition=<partition>
  - qos=sbatch
  - mem_mb="490G"
  - tmpdir="/path/to/temp/"
  - time="0-10:00:00"
max-jobs-per-second: 10
max-status-checks-per-second: 1
latency-wait: 60
jobs: 16
keep-going: True
rerun-incomplete: True
printshellcmds: True
scheduler: greedy
use-conda: True
cluster-status: status-sacct.sh

The submission log reads:

Error in rule fastqc_fastq_gz:
    jobid: 38
    input: /path/raw/S9_S9_R2_001.fastq.gz
    output: /path/pipeline_out/fastqc/S9_S9_R2_fastqc.html, /path/pipeline_out/fastqc/S9_S9_R2_fastqc.zip
    log: /path/pipeline_out/logs/fastqc/S9_S9_R2.log (check log file(s) for error details)
    conda-env: /path/.snakemake/conda/a116f377bbaddedd93b228a3d4f74b1d_
    cluster_jobid: 1491196

Error executing rule fastqc_fastq_gz on cluster (jobid: 38, external: 1491196, jobscript: /path/.snakemake/tmp.l0raomfw/snakejob.fastqc_fastq_gz.38.sh). For error details see the cluster log and the log files of the involved rule(s).

提交日志显示:

Error in rule fastqc_fastq_gz:
    jobid: 38
    input: /path/raw/S9_S9_R2_001.fastq.gz
    output: /path/pipeline_out/fastqc/S9_S9_R2_fastqc.html, /path/pipeline_out/fastqc/S9_S9_R2_fastqc.zip
    log: /path/pipeline_out/logs/fastqc/S9_S9_R2.log (check log file(s) for error details)
    conda-env: /path/.snakemake/conda/a116f377bbaddedd93b228a3d

<details>
<summary>英文:</summary>

I am trying to use wrappers in my pipeline on a SLURM cluster, where the compute nodes do not have internet access.

I have run the pipeline with `--conda-create-envs-only` first, and then changed the `wrapper:` directives to point to local folders containing `environment.yaml` files.
Jobs fail without specific error. Test rule with the same configuration but without wrappers works. Rules with wrappers work fine on login node if I switch back the `wrapper:` directive to look online.

I am running:
```shell
snakemake --profile myprofile --cores 40 --use-conda

Example rule:

# Run FastQC on the fastq.gz files
rule fastqc_fastq_gz:
    input:
        input_dir + &quot;{sample}_{read}_001.fastq.gz&quot;,
    output:
        html = output_dir + &quot;fastqc/{sample}_{read}_fastqc.html&quot;,
        zip = output_dir + &quot;fastqc/{sample}_{read}_fastqc.zip&quot;,
    params: 
        extra = &quot;--quiet&quot;,
    log:
        output_dir + &quot;logs/fastqc/{sample}_{read}.log&quot;,
    threads: 1,
    resources:
        mem_mb = 1024,   
    wrapper:
        # &quot;file:///path/envs/v1.31.0/bio/fastqc/&quot;   # &lt;- Fails on both login and compute nodes
        &quot;v1.31.0/bio/fastqc&quot;                        # &lt;- Works on login node to download env

I have also tried using more resources, same behavior.

My profile:

cluster:
  mkdir -p logs/{rule} &amp;&amp;
  sbatch
    --partition={resources.partition}
    --qos={resources.qos}
    --cpus-per-task={threads}
    --mem={resources.mem_mb}
    --job-name=smk-{rule}-{wildcards}
    --output=logs/{rule}/{rule}-{wildcards}-%j.out
    --error=logs/{rule}/{rule}-{wildcards}-.%j.err
    --account=&lt;account&gt;
    --time={resources.time}
    --parsable
default-resources:
  - partition=&lt;partition&gt;
  - qos=sbatch
  - mem_mb=&quot;490G&quot;
  - tmpdir=&quot;/path/to/temp/&quot;
  - time=&quot;0-10:00:00&quot;
max-jobs-per-second: 10
max-status-checks-per-second: 1
latency-wait: 60
jobs: 16
keep-going: True
rerun-incomplete: True
printshellcmds: True
scheduler: greedy
use-conda: True
cluster-status: status-sacct.sh

The submission log reads:

Error in rule fastqc_fastq_gz:
    jobid: 38
    input: /path/raw/S9_S9_R2_001.fastq.gz
    output: /path/pipeline_out/fastqc/S9_S9_R2_fastqc.html, /path/pipeline_out/fastqc/S9_S9_R2_fastqc.zip
    log: /path/pipeline_out/logs/fastqc/S9_S9_R2.log (check log file(s) for error details)
    conda-env: /path/.snakemake/conda/a116f377bbaddedd93b228a3d4f74b1d_
    cluster_jobid: 1491196

Error executing rule fastqc_fastq_gz on cluster (jobid: 38, external: 1491196, jobscript: /path/.snakemake/tmp.l0raomfw/snakejob.fastqc_fastq_gz.38.sh). For error details see the cluster log and the log files of the involved rule(s).

The tmp files do not exist.
Job logs simply read:

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 80
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=1000, mem_mib=954, disk_mb=7876, disk_mib=7512
Select jobs to execute...

[Date Time]
rule fastqc_fastq_gz:
    input: /path/raw/S9_S9_R2_001.fastq.gz
    output: /path/pipeline_out/fastqc/S9_S9_R2_fastqc.html, /path/pipeline_out/fastqc/S9_S9_R2_fastqc.zip
    log: /path/pipeline_out/logs/fastqc/S9_S9_R2.log
    jobid: 0
    reason: Missing output files: /path/pipeline_out/fastqc/S9_S9_R2_fastqc.zip, /path/pipeline_out/fastqc/S9_S9_R2_fastqc.html
    wildcards: sample=S9_S9, read=R2
    threads: 2
    resources: mem_mb=1000, mem_mib=954, disk_mb=7876, disk_mib=7512, tmpdir=/path/tmp/snakemake, partition=el7taskp, qos=sbatch, time=0-40:00:00

[Date Time]
Error in rule fastqc_fastq_gz:
    jobid: 0
    input: /path/raw/230319/S9_S9_R2_001.fastq.gz
    output: /path/pipeline_out/fastqc/S9_S9_R2_fastqc.html, /path/pipeline_out/fastqc/S9_S9_R2_fastqc.zip
    log: /path/pipeline_out/logs/fastqc/S9_S9_R2.log (check log file(s) for error details)
    conda-env: /path/.snakemake/conda/a116f377bbaddedd93b228a3d4f74b1d_

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

FastQC logs are empty.

If I run on compute nodes with the wrapper directives pointing to github it fails with error:

Building DAG of jobs...
Failed to open source file https://github.com/snakemake/snakemake-wrappers/raw/v1.31.0/bio/fastqc/environment.yaml
ConnectionError: HTTPSConnectionPool(host=&#39;github.com&#39;, port=443): Max retries exceeded with url: /snakemake/snakemake-wrappers/raw/v1.31.0/bio/fastqc/environment.yaml (Caused by NewConnectionError(&#39;&lt;urllib3.connection.HTTPSConnection object at 0x7fd067df0810&gt;: Failed to establish a new connection: [Errno 113] No route to host&#39;)), attempt 1/3 failed - retrying in 3 seconds...
Failed to open source file https://github.com/snakemake/snakemake-wrappers/raw/v1.31.0/bio/fastqc/environment.yaml
ConnectionError: HTTPSConnectionPool(host=&#39;github.com&#39;, port=443): Max retries exceeded with url: /snakemake/snakemake-wrappers/raw/v1.31.0/bio/fastqc/environment.yaml (Caused by NewConnectionError(&#39;&lt;urllib3.connection.HTTPSConnection object at 0x7fd067df2150&gt;: Failed to establish a new connection: [Errno 113] No route to host&#39;)), attempt 2/3 failed - retrying in 6 seconds...
Failed to open source file https://github.com/snakemake/snakemake-wrappers/raw/v1.31.0/bio/fastqc/environment.yaml
ConnectionError: HTTPSConnectionPool(host=&#39;github.com&#39;, port=443): Max retries exceeded with url: /snakemake/snakemake-wrappers/raw/v1.31.0/bio/fastqc/environment.yaml (Caused by NewConnectionError(&#39;&lt;urllib3.connection.HTTPSConnection object at 0x7fd067dfc250&gt;: Failed to establish a new connection: [Errno 113] No route to host&#39;)), attempt 3/3 failed - giving up!
WorkflowError:
Failed to open source file https://github.com/snakemake/snakemake-wrappers/raw/v1.31.0/bio/fastqc/environment.yaml
ConnectionError: HTTPSConnectionPool(host=&#39;github.com&#39;, port=443): Max retries exceeded with url: /snakemake/snakemake-wrappers/raw/v1.31.0/bio/fastqc/environment.yaml (Caused by NewConnectionError(&#39;&lt;urllib3.connection.HTTPSConnection object at 0x7fd067dfc250&gt;: Failed to establish a new connection: [Errno 113] No route to host&#39;))
  File &quot;/path/mambaforge/envs/snakemake/lib/python3.11/site-packages/reretry/api.py&quot;, line 218, in retry_call
  File &quot;/path/mambaforge/envs/snakemake/lib/python3.11/site-packages/reretry/api.py&quot;, line 31, in __retry_internal

答案1

得分: 1

Ok, so I am marking this as solved.
Snakemake requires internet connection for wrappers, even if the environment has been previously built using --conda-create-envs-only. My solution does not work because snakemake needs the whole wrapper pointed to and not just the yaml file. The solution Troy pointed to works, but a somewhat more elegant solution suggested by euronion would be to move all wrappers in a path ending in v1.31.0/bio/<tool> and then use the --wrapper-prefix="file:///path/envs/ flag when calling snakemake on the compute nodes. In any case I am opening a feature request on Github for a more streamlined way to do this because even the prefix solution feels clunky and requires the added step of cloning the snakemake-wrappers repo.

英文:

Ok, so I am marking this as solved.
Snakemake requires internet connection for wrappers, even if the environment has been previously built using --conda-create-envs-only. My solution does not work because snakemake needs the whole wrapper pointed to and not just the yaml file. The solution Troy pointed to works, but a somewhat more elegant solution suggested by euronion would be to move all wrappers in a path ending in v1.31.0/bio/&lt;tool&gt; and then use the --wrapper-prefix=&quot;file:///path/envs/ flag when calling snakemake on the compute nodes. In any case I am opening <a href="https://github.com/snakemake/snakemake/issues/2262">a feature request on Github</a> for a more streamlined way to do this because even the prefix solution feels clunky and requires the added step of cloning the snakemake-wrappers repo.

huangapple
  • 本文由 发表于 2023年5月17日 09:10:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76267968.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定