2023年6月16日 00:14:34go评论113阅读模式

英文:

Nextflow on GCP - waiting on container error

问题

我在Google批处理上使用Nextflow运行管道，但是我遇到以下错误：

ERROR ~ Error executing process > 'PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)'
Caused by:
  Process `PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)` terminated with an error exit status (null)
Command executed:
  mkdir output
  nlrexpress.py \
        --input All_Candidate_Soybean_Prots_Simplified_Sorted.fasta \
        --outdir ./output \
        --module all
  mv output/*.short.output.txt ./
Command exit status:
  null
Command output:
  15/06/2023 15:36:31:  ############ NLRexpress started ############
  15/06/2023 15:36:31:  Input FASTA: All_Candidate_Soybean_Prots_Simplified_Sorted.fasta
  15/06/2023 15:36:31:  Checking FASTA file - started
  15/06/2023 15:36:31:  Checking FASTA file - done
  15/06/2023 15:36:31:  Running JackHMMER - started
Command error:
  time="2023-06-15T15:39:22Z" level=error msg="error waiting for container: "
Work dir:
  gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

模块nf文件如下：

process NLREXPRESS {
  tag "$sample_id"
  maxForks 1
  container = 'dthorbur1990/nlrexpress:latest'
  cpus { 4 * task.attempt }
  memory { 12.GB * task.attempt }
  disk "15.GB"
  publishDir(
    path: "${params.PlantDir}",
    mode: 'copy',
  )
  
  input:
      tuple val(sample_id), path(peptides)
  output:
      path "*.short.output.txt", emit: nlre_out
  script:
  """
  mkdir output
  nlrexpress.py \\
        --input ${peptides} \\
        --outdir ./output \\
        --module ${params.NE_Modules}
  mv output/*.short.output.txt ./
  """
}

当我在本地运行它时，流程没有错误，并且我已经重新构建了容器，它按预期工作。

让我困惑的是，workDir 不包含 .command.{out,err} 文件，这表明（至少对我来说）它没有在运行。但错误消息的 Command output 部分是该工具的正确前几行。

这是workDir：

gsutil ls gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.begin
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.run
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.sh

这是关于NLREXPRESS模块的日志文件末尾：

All_Candidate_Soybean_Prots_Simplified_Sorted)","q3Label":"PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)"},{"cpuUsage":null,"process":"ORIENTATION","mem":null,"memUsage":null,"timeUsage":null,"vmem":null,"reads":null,"cpu":null,"time":null,"writes":null}]

我感到困惑。我尝试增加内存但似乎没有起作用。有什么建议吗？如果有帮助的话，我可以添加nextflow.log文件。

英文:

I'm running a pipeline on using nextflow on google batch. However, I'm getting the following error:

ERROR ~ Error executing process &gt; &#39;PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)&#39;
Caused by:
  Process `PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)` terminated with an error exit status (null)
Command executed:
  mkdir output
  nlrexpress.py \
        --input All_Candidate_Soybean_Prots_Simplified_Sorted.fasta \
        --outdir ./output \
        --module all
  mv output/*.short.output.txt ./
Command exit status:
  null
Command output:
  15/06/2023 15:36:31:  ############ NLRexpress started ############
  15/06/2023 15:36:31:  Input FASTA: All_Candidate_Soybean_Prots_Simplified_Sorted.fasta
  15/06/2023 15:36:31:  Checking FASTA file - started
  15/06/2023 15:36:31:  Checking FASTA file - done
  15/06/2023 15:36:31:  Running JackHMMER - started
Command error:
  time=&quot;2023-06-15T15:39:22Z&quot; level=error msg=&quot;error waiting for container: &quot;
Work dir:
  gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
 -- Check &#39;.nextflow.log&#39; file for details

The module nf file is here:

process NLREXPRESS {
  tag &quot;$sample_id&quot;
  maxForks 1
  container = &#39;dthorbur1990/nlrexpress:latest&#39;
  cpus { 4 * task.attempt }
  memory { 12.GB * task.attempt }
  disk &quot;15.GB&quot;
  publishDir(
    path: &quot;${params.PlantDir}&quot;,
    mode: &#39;copy&#39;,
  )
  
  input:
      tuple val(sample_id), path(peptides)
  output:
      path &quot;*.short.output.txt&quot;, emit: nlre_out
  script:
  &quot;&quot;&quot;
  mkdir output
  nlrexpress.py \\
        --input ${peptides} \\
        --outdir ./output \\
        --module ${params.NE_Modules}
  mv output/*.short.output.txt ./
  &quot;&quot;&quot;
}

The process was running without error when I ran it locally, and I have rebuilt the container and it works as intended.

What confuses me is that the workDir doesn't contain either .command.{out,err} files suggesting (to me at least) that it's not running. But the Command output section of the error message is the correct first few lines of the tool.

Here is the workDir:

gsutil ls gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.begin
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.run
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.sh

And here is the end of the log file regarding the NLREXPRESS module:

All_Candidate_Soybean_Prots_Simplified_Sorted)&quot;,&quot;q3Label&quot;:&quot;PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)&quot;},&quot;writes&quot;:null},{&quot;cpuUsage&quot;:null,&quot;process&quot;:&quot;ORIENTATION&quot;,&quot;mem&quot;:null,&quot;memUsage&quot;:null,&quot;timeUsage&quot;:null,&quot;vmem&quot;:null,&quot;reads&quot;:null,&quot;cpu&quot;:null,&quot;time&quot;:null,&quot;writes&quot;:null}]

I'm at a loss. I've tried increasing memory but that hasn't seemed to have worked. Any ideas? Happy to add the nextflow.log file if that would be helpful.

答案1

得分: 1

我不确定我是否有答案给你，但我认为这种行为可能与Nextflow运行作业的方式有关。如果你查看.command.run脚本中nxf_main函数的末尾，你会看到类似以下的内容：

nxf_main() {
    ...
    set +e
    ctmp=$(set +u; nxf_mktemp /dev/shm 2&gt;/dev/null || nxf_mktemp $TMPDIR)
    local cout=$ctmp/.command.out; mkfifo $cout
    local cerr=$ctmp/.command.err; mkfifo $cerr
    tee .command.out &lt; $cout &amp;
    tee1=$!
    tee .command.err &lt; $cerr &gt;&amp;2 &amp;
    tee2=$!
    ( nxf_launch ) &gt;$cout 2&gt;$cerr &amp;
    pid=$!
    wait $pid || nxf_main_ret=$?
    wait $tee1 $tee2
    nxf_unstage
}

当启用errexit（set -e）时，任何返回非零退出状态的命令都会立即终止脚本。因此，通过使用set +e，我们明确地禁用了这种行为。这意味着尽管通过nxf_launch运行Docker容器，.command.out和.command.err可能不一定会被创建。

所以我想知道是否与/dev/shm的大小有问题？Google Cloud Batch支持containerOptions指令，因此，在第一次尝试时，你可以尝试通过向你的流程定义添加以下内容来增加shm-size²：

process NLREXPRESS {
    container 'dthorbur1990/nlrexpress:latest'
    containerOptions '--shm-size 2g'
    ...
}

请注意，在设置container指令时有一个拼写错误。不确定是否会引发问题，但这里应该避免使用=字符。在nextflow.config等地方，确实需要使用=字符进行赋值语法。

英文:

I'm not sure if I have an answer for you, but I think this behavior might have something to do with how Nextflow runs the job. If you look at the end of the nxf_main function in the .command.run script, you'll see something like:

nxf_main() {
    ...
    set +e
    ctmp=$(set +u; nxf_mktemp /dev/shm 2&gt;/dev/null || nxf_mktemp $TMPDIR)
    local cout=$ctmp/.command.out; mkfifo $cout
    local cerr=$ctmp/.command.err; mkfifo $cerr
    tee .command.out &lt; $cout &amp;
    tee1=$!
    tee .command.err &lt; $cerr &gt;&amp;2 &amp;
    tee2=$!
    ( nxf_launch ) &gt;$cout 2&gt;$cerr &amp;
    pid=$!
    wait $pid || nxf_main_ret=$?
    wait $tee1 $tee2
    nxf_unstage
}

When errexit is enabled (set -e), any command that returns a non-zero exit status immediately terminates the script. So by using set +e, we are explicitly disabling this behavior. This means that .command.out and .command.err may not necessarily be created despite the Docker container being run (via nxf_launch).

So I wonder if there is a problem with the size of /dev/shm? Google Cloud Batch supports the containerOptions directive<sup>1</sup>, so in the first instance, you might like to try bumping the shm-size<sup>2</sup> using something like this to your process definition:

process NLREXPRESS {
    container &#39;dthorbur1990/nlrexpress:latest&#39;
    containerOptions &#39;--shm-size 2g&#39;
    ...
}

Note the typo when setting the container directive. Not sure if it will cause problems, but the = character should be avoided here. The assignment syntax using the = character is indeed required inside your nextflow.config for example.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Nextflow 在 GCP 上 – 等待容器错误

问题

答案1

GCP API Gateway中的API密钥 – 限制为300。

Google Spanner 最简单的查询超时错误

Docker化的Redis版本存在配置问题。

Login to bigquery from golang using json keyfile

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。