2023年5月11日 03:24:52go评论82阅读模式

英文:

Conflicts between Snakemake and GROMACS?

问题

I tried to simplify my issue as much as possible and still getting the error.

The whole idea is that I want to execute (inside a much more complex workflow) the command:
gmx mdrun -nt 12 -deffnm emin -cpi on a cluster. For that I have a conda environment with GROMACS and Snakemake.

On a traditional way I have the jobscript (traditional_job.sh) with:

#!/bin/bash
#SBATCH --partition=uds-hub 
#SBATCH --nodes=1
#SBATCH --cpus-per-task=12 
#SBATCH --mem=5000
#SBATCH --time=15:00
#SBATCH --job-name=reproduce_error
#SBATCH --output=reproduce_error.o
#SBATCH --error=reproduce_error.e
gmx mdrun -nt 12 -deffnm emin -cpi

And everything works as expected after sbatch traditional_job.sh. However, if I try to use Snakemake instead, the problems start.

My Snakefile is:

rule gmx:
    input:
        tpr = "emin.tpr"
    output:
        out = 'emin.gro'
    shell:
        '''
           gmx mdrun -nt 12 -deffnm emin -cpi
        '''

And my job.sh:

#!/bin/bash
snakemake \
    --jobs 10000 \
    --verbose \
    --debug-dag \
    --latency-wait 50 \
    --cluster-cancel scancel \
    --rerun-incomplete \
    --keep-going \
    --cluster '
        sbatch \
        --partition=uds-hub \
        --nodes=1 \
        --cpus-per-task=12 \
        --mem=5000 \
        --time=15:00 \
        --job-name=reproduce_error \
        --output=reproduce_error.o \
        --error=reproduce_error.e '

After ./job.sh, the GROMACS's error (written out on reproduce_error.e) is:

Program:     gmx mdrun, version 2022.2-conda_forge
Source file: src/gromacs/taskassignment/resourcedivision.cpp (line 220)
Fatal error:
When using GPUs, setting the number of OpenMP threads without specifying the
number of ranks can lead to conflicting demands. Please specify the number of
thread-MPI ranks as well (option -ntmpi).
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors

I observed that in the output of snakemake is written a different shell (#!/bin/sh). But honestly, I do not know if that could be the problem, neither how to solve it if that is the case.

Jobscript:
#!/bin/sh
# properties = {"type": "single", "rule": "gmx", "local": false, "input": ["emin.tpr"], "output": ["emin.gro"], "wildcards": {}, "params": {}, "log": [], "threads": 1, "resources": {"mem_mb": 1000, "mem_mib": 954, "disk_mb": 1000, "disk_mib": 954, "tmpdir": "<TBD>"}, "jobid": 0, "cluster": {}}
cd '/home/uds_alma015/GIT/BindFlow/ideas/reproduce error' && /home/uds_alma015/.conda/envs/abfe/bin/python3.9 -m snakemake --snakefile '/home/uds_alma015/GIT/BindFlow/ideas/reproduce error/Snakefile' --target-jobs 'gmx:' --allowed-rules 'gmx' --cores 'all' --attempt 1 --force-use-threads  --resources 'mem_mb=1000' 'mem_mib=954' 'disk_mb=1000' 'disk_mib=954' --wait-for-files '/home/uds_alma015/GIT/BindFlow/ideas/reproduce error/.snakemake/tmp.lfq5jq4u' 'emin.tpr' --force --keep-target-files --keep-remote --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --rerun-triggers 'software-env' 'params' 'input' 'mtime' 'code' --skip-script-cleanup  --conda-frontend 'mamba' --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --latency-wait 50 --scheduler 'ilp' --scheduler-solver-path '/home/uds_alma015/.conda/envs/abfe/bin' --default-resources 'mem_mb=max(2*input.size_mb, 1000)' 'disk_mb=max(2*input.size_mb, 1000)' 'tmpdir=system_tmpdir' --mode 2 && touch '/home/uds_alma015/GIT/BindFlow/ideas/reproduce error/.snakemake/tmp.lfq5jq4u/0.jobfinished' || (touch '/home/uds_alma015/GIT/BindFlow/ideas/reproduce error/.snakemake/tmp.lfq5jq4u/0.jobfailed'; exit 1)

P.s. ChatGPT goes in a circle with this question

Update

I even isolate more the error. The following other_job.sh (submitted to the cluster as sbatch other_job.sh) script also gave the same error:

#!/bin/bash
#SBATCH --partition=uds-hub 
#SBATCH --nodes=1
#SBATCH --cpus-per-task=12 
#SBATCH --mem=5000
#SBATCH --time=15:00
#SBATCH --job-name=reproduce_error
#SBATCH --output=reproduce_error.o
#SBATCH --error=reproduce_error.e
/home/uds_alma015/.conda/envs/abfe/bin/python3.9 -m snakemake --snakefile '/home/uds_alma015/GIT/BindFlow/ideas/reproduce_error/Snakefile' --target-jobs 'gmx:' --allowed-rules 'gmx' --cores 'all' --attempt 1 --force-use-threads  --resources 'mem_mb=1000' 'mem_mib=954' 'disk_mb=1000' 'disk_mib=954' --wait-for-files '/home/uds_alma015/GIT/BindFlow/ideas/reproduce_error/.snakemake/tmp.mqan6qbp' 'emin.tpr' --force --keep-target-files --keep-remote --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --rerun-triggers 'mtime' 'software-env' 'params' 'code' 'input' --skip-script-cleanup  --conda-frontend 'mamba' --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --latency-wait 50 --scheduler 'ilp' --scheduler-solver-path '/home/uds_alma015/.conda/envs/
<details>
<summary>英文:</summary>
I tried to simplify my issue as much as possible and still getting the error.
The whole idea is that I want to execute (inside a much more complex workflow) the command:
`gmx mdrun -nt 12 -deffnm emin -cpi` on a cluster. For that I have a conda environment with GROMACS and Snakemake.
On a traditional way I have the jobscript (`traditional_job.sh`) with:
```bash
#!/bin/bash
#SBATCH --partition=uds-hub 
#SBATCH --nodes=1
#SBATCH --cpus-per-task=12 
#SBATCH --mem=5000
#SBATCH --time=15:00
#SBATCH --job-name=reproduce_error
#SBATCH --output=reproduce_error.o
#SBATCH --error=reproduce_error.e
gmx mdrun -nt 12 -deffnm emin -cpi

And everything works as expected after sbatch traditional_job.sh. However, if I try to use Snakemake instead, the problems start.

My Snakefile is:

rule gmx:
    input:
        tpr = &quot;emin.tpr&quot;
    output:
        out = &#39;emin.gro&#39;
    shell:
        &#39;&#39;&#39;
           gmx mdrun -nt 12 -deffnm emin -cpi
        &#39;&#39;&#39;

And my job.sh:

#!/bin/bash
snakemake \
    --jobs 10000 \
    --verbose \
    --debug-dag \
    --latency-wait 50 \
    --cluster-cancel scancel \
    --rerun-incomplete \
    --keep-going \
    --cluster &#39;
        sbatch \
        --partition=uds-hub \
        --nodes=1 \
        --cpus-per-task=12 \
        --mem=5000 \
        --time=15:00 \
        --job-name=reproduce_error \
        --output=reproduce_error.o \
        --error=reproduce_error.e &#39;

After ./job.sh, the GROMACS's error (written out on reproduce_error.e) is:

Program:     gmx mdrun, version 2022.2-conda_forge
Source file: src/gromacs/taskassignment/resourcedivision.cpp (line 220)
Fatal error:
When using GPUs, setting the number of OpenMP threads without specifying the
number of ranks can lead to conflicting demands. Please specify the number of
thread-MPI ranks as well (option -ntmpi).
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors

I observed that in the output of snakemake is written a different shell (#!/bin/sh). But honestly, I do not know if that could be the problem, neither how to solve it if that is the case:

Jobscript:
#!/bin/sh
# properties = {&quot;type&quot;: &quot;single&quot;, &quot;rule&quot;: &quot;gmx&quot;, &quot;local&quot;: false, &quot;input&quot;: [&quot;emin.tpr&quot;], &quot;output&quot;: [&quot;emin.gro&quot;], &quot;wildcards&quot;: {}, &quot;params&quot;: {}, &quot;log&quot;: [], &quot;threads&quot;: 1, &quot;resources&quot;: {&quot;mem_mb&quot;: 1000, &quot;mem_mib&quot;: 954, &quot;disk_mb&quot;: 1000, &quot;disk_mib&quot;: 954, &quot;tmpdir&quot;: &quot;&lt;TBD&gt;&quot;}, &quot;jobid&quot;: 0, &quot;cluster&quot;: {}}
cd &#39;/home/uds_alma015/GIT/BindFlow/ideas/reproduce error&#39; &amp;&amp; /home/uds_alma015/.conda/envs/abfe/bin/python3.9 -m snakemake --snakefile &#39;/home/uds_alma015/GIT/BindFlow/ideas/reproduce error/Snakefile&#39; --target-jobs &#39;gmx:&#39; --allowed-rules &#39;gmx&#39; --cores &#39;all&#39; --attempt 1 --force-use-threads  --resources &#39;mem_mb=1000&#39; &#39;mem_mib=954&#39; &#39;disk_mb=1000&#39; &#39;disk_mib=954&#39; --wait-for-files &#39;/home/uds_alma015/GIT/BindFlow/ideas/reproduce error/.snakemake/tmp.lfq5jq4u&#39; &#39;emin.tpr&#39; --force --keep-target-files --keep-remote --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --rerun-triggers &#39;software-env&#39; &#39;params&#39; &#39;input&#39; &#39;mtime&#39; &#39;code&#39; --skip-script-cleanup  --conda-frontend &#39;mamba&#39; --wrapper-prefix &#39;https://github.com/snakemake/snakemake-wrappers/raw/&#39; --latency-wait 50 --scheduler &#39;ilp&#39; --scheduler-solver-path &#39;/home/uds_alma015/.conda/envs/abfe/bin&#39; --default-resources &#39;mem_mb=max(2*input.size_mb, 1000)&#39; &#39;disk_mb=max(2*input.size_mb, 1000)&#39; &#39;tmpdir=system_tmpdir&#39; --mode 2 &amp;&amp; touch &#39;/home/uds_alma015/GIT/BindFlow/ideas/reproduce error/.snakemake/tmp.lfq5jq4u/0.jobfinished&#39; || (touch &#39;/home/uds_alma015/GIT/BindFlow/ideas/reproduce error/.snakemake/tmp.lfq5jq4u/0.jobfailed&#39;; exit 1)

P.s. ChatGPT goes in circle with this question

Update

I even isolate more the error. The following other_job.sh (submitted to the cluster as sbatch other_job.sh) script also gave the same error:

#!/bin/bash
#SBATCH --partition=uds-hub 
#SBATCH --nodes=1
#SBATCH --cpus-per-task=12 
#SBATCH --mem=5000
#SBATCH --time=15:00
#SBATCH --job-name=reproduce_error
#SBATCH --output=reproduce_error.o
#SBATCH --error=reproduce_error.e
/home/uds_alma015/.conda/envs/abfe/bin/python3.9 -m snakemake --snakefile &#39;/home/uds_alma015/GIT/BindFlow/ideas/reproduce_error/Snakefile&#39; --target-jobs &#39;gmx:&#39; --allowed-rules &#39;gmx&#39; --cores &#39;all&#39; --attempt 1 --force-use-threads  --resources &#39;mem_mb=1000&#39; &#39;mem_mib=954&#39; &#39;disk_mb=1000&#39; &#39;disk_mib=954&#39; --wait-for-files &#39;/home/uds_alma015/GIT/BindFlow/ideas/reproduce_error/.snakemake/tmp.mqan6qbp&#39; &#39;emin.tpr&#39; --force --keep-target-files --keep-remote --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --rerun-triggers &#39;mtime&#39; &#39;software-env&#39; &#39;params&#39; &#39;code&#39; &#39;input&#39; --skip-script-cleanup  --conda-frontend &#39;mamba&#39; --wrapper-prefix &#39;https://github.com/snakemake/snakemake-wrappers/raw/&#39; --latency-wait 50 --scheduler &#39;ilp&#39; --scheduler-solver-path &#39;/home/uds_alma015/.conda/envs/abfe/bin&#39; --default-resources &#39;mem_mb=max(2*input.size_mb, 1000)&#39; &#39;disk_mb=max(2*input.size_mb, 1000)&#39; &#39;tmpdir=system_tmpdir&#39; --mode 2

And this is the command build by snakemake. It looks like that command does not interact (in some how) with the definitions of SBATCH. But still not sure.

答案1

得分: 1

我遇到了与snakemake类似的问题，但是是与另一个程序有关。最终，我不得不明确取消设置一些环境变量，因为snakemake会为每个实例化的shell设置它们。特别是OMP_NUM_THREADS是我的问题的根本原因。

请确保比较snakemake中设置的环境变量与您的常规脚本中设置的环境变量，这可能会帮助您找到问题的原因。

英文:

I had a similar issue with snakemake, but with a different program. In the end I had to specifically unset some environment variables, because snakemake sets them for each instantiated shell. In particular OMP_NUM_THREADS was the root cause of my problem.

Make sure to compare which environment variables are set within snakemake and in your regular script, this might give you a hint to find the culprit variable.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“Snakemake” 和 “GROMACS” 之间的冲突？

问题

Update

Update

答案1

如何通过双击在终端中运行一个.sh脚本？

使用FPAT来拆分CSV文件，并将任何嵌入的逗号替换为空格。

将”fzf to return combination of input fields”翻译为中文： “fzf 返回输入字段的组合”

运行Octave函数从Shell。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论