Nextflow 在 GCP 上 – 等待容器错误

huangapple go评论62阅读模式
英文:

Nextflow on GCP - waiting on container error

问题

我在Google批处理上使用Nextflow运行管道,但是我遇到以下错误:

ERROR ~ Error executing process > 'PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)'

Caused by:
  Process `PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)` terminated with an error exit status (null)

Command executed:

  mkdir output
  nlrexpress.py \
        --input All_Candidate_Soybean_Prots_Simplified_Sorted.fasta \
        --outdir ./output \
        --module all

  mv output/*.short.output.txt ./

Command exit status:
  null

Command output:
  15/06/2023 15:36:31:  ############ NLRexpress started ############
  15/06/2023 15:36:31:  Input FASTA: All_Candidate_Soybean_Prots_Simplified_Sorted.fasta
  15/06/2023 15:36:31:  Checking FASTA file - started
  15/06/2023 15:36:31:  Checking FASTA file - done
  15/06/2023 15:36:31:  Running JackHMMER - started

Command error:
  time="2023-06-15T15:39:22Z" level=error msg="error waiting for container: "

Work dir:
  gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

模块nf文件如下:

process NLREXPRESS {
  tag "$sample_id"
  maxForks 1
  container = 'dthorbur1990/nlrexpress:latest'

  cpus { 4 * task.attempt }
  memory { 12.GB * task.attempt }
  disk "15.GB"

  publishDir(
    path: "${params.PlantDir}",
    mode: 'copy',
  )
  
  input:
      tuple val(sample_id), path(peptides)

  output:
      path "*.short.output.txt", emit: nlre_out

  script:
  """
  mkdir output
  nlrexpress.py \\
        --input ${peptides} \\
        --outdir ./output \\
        --module ${params.NE_Modules}

  mv output/*.short.output.txt ./
  """
}

当我在本地运行它时,流程没有错误,并且我已经重新构建了容器,它按预期工作。

让我困惑的是,workDir 不包含 .command.{out,err} 文件,这表明(至少对我来说)它没有在运行。但错误消息的 Command output 部分是该工具的正确前几行。

这是workDir

gsutil ls gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.begin
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.run
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.sh

这是关于NLREXPRESS模块的日志文件末尾:

All_Candidate_Soybean_Prots_Simplified_Sorted)","q3Label":"PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)"},{"cpuUsage":null,"process":"ORIENTATION","mem":null,"memUsage":null,"timeUsage":null,"vmem":null,"reads":null,"cpu":null,"time":null,"writes":null}]

我感到困惑。我尝试增加内存但似乎没有起作用。有什么建议吗?如果有帮助的话,我可以添加nextflow.log文件。

英文:

I'm running a pipeline on using nextflow on google batch. However, I'm getting the following error:

ERROR ~ Error executing process > 'PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)'

Caused by:
  Process `PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)` terminated with an error exit status (null)

Command executed:

  mkdir output
  nlrexpress.py \
        --input All_Candidate_Soybean_Prots_Simplified_Sorted.fasta \
        --outdir ./output \
        --module all

  mv output/*.short.output.txt ./

Command exit status:
  null

Command output:
  15/06/2023 15:36:31:  ############ NLRexpress started ############
  15/06/2023 15:36:31:  Input FASTA: All_Candidate_Soybean_Prots_Simplified_Sorted.fasta
  15/06/2023 15:36:31:  Checking FASTA file - started
  15/06/2023 15:36:31:  Checking FASTA file - done
  15/06/2023 15:36:31:  Running JackHMMER - started

Command error:
  time="2023-06-15T15:39:22Z" level=error msg="error waiting for container: "

Work dir:
  gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

The module nf file is here:

process NLREXPRESS {
  tag "$sample_id"
  maxForks 1
  container = 'dthorbur1990/nlrexpress:latest'

  cpus { 4 * task.attempt }
  memory { 12.GB * task.attempt }
  disk "15.GB"

  publishDir(
    path: "${params.PlantDir}",
    mode: 'copy',
  )
  
  input:
      tuple val(sample_id), path(peptides)

  output:
      path "*.short.output.txt", emit: nlre_out

  script:
  """
  mkdir output
  nlrexpress.py \\
        --input ${peptides} \\
        --outdir ./output \\
        --module ${params.NE_Modules}

  mv output/*.short.output.txt ./
  """
}

The process was running without error when I ran it locally, and I have rebuilt the container and it works as intended.

What confuses me is that the workDir doesn't contain either .command.{out,err} files suggesting (to me at least) that it's not running. But the Command output section of the error message is the correct first few lines of the tool.

Here is the workDir:

gsutil ls gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.begin
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.run
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.sh

And here is the end of the log file regarding the NLREXPRESS module:

All_Candidate_Soybean_Prots_Simplified_Sorted)","q3Label":"PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)"},"writes":null},{"cpuUsage":null,"process":"ORIENTATION","mem":null,"memUsage":null,"timeUsage":null,"vmem":null,"reads":null,"cpu":null,"time":null,"writes":null}]

I'm at a loss. I've tried increasing memory but that hasn't seemed to have worked. Any ideas? Happy to add the nextflow.log file if that would be helpful.

答案1

得分: 1

我不确定我是否有答案给你,但我认为这种行为可能与Nextflow运行作业的方式有关。如果你查看.command.run脚本中nxf_main函数的末尾,你会看到类似以下的内容:

nxf_main() {

    ...

    set +e
    ctmp=$(set +u; nxf_mktemp /dev/shm 2>/dev/null || nxf_mktemp $TMPDIR)
    local cout=$ctmp/.command.out; mkfifo $cout
    local cerr=$ctmp/.command.err; mkfifo $cerr
    tee .command.out < $cout &
    tee1=$!
    tee .command.err < $cerr >&2 &
    tee2=$!
    ( nxf_launch ) >$cout 2>$cerr &
    pid=$!
    wait $pid || nxf_main_ret=$?
    wait $tee1 $tee2
    nxf_unstage
}

当启用errexitset -e)时,任何返回非零退出状态的命令都会立即终止脚本。因此,通过使用set +e,我们明确地禁用了这种行为。这意味着尽管通过nxf_launch运行Docker容器,.command.out.command.err可能不一定会被创建。

所以我想知道是否与/dev/shm的大小有问题?Google Cloud Batch支持containerOptions指令,因此,在第一次尝试时,你可以尝试通过向你的流程定义添加以下内容来增加shm-size2

process NLREXPRESS {

    container 'dthorbur1990/nlrexpress:latest'
    containerOptions '--shm-size 2g'

    ...
}

请注意,在设置container指令时有一个拼写错误。不确定是否会引发问题,但这里应该避免使用=字符。在nextflow.config等地方,确实需要使用=字符进行赋值语法。

英文:

I'm not sure if I have an answer for you, but I think this behavior might have something to do with how Nextflow runs the job. If you look at the end of the nxf_main function in the .command.run script, you'll see something like:

nxf_main() {

    ...

    set +e
    ctmp=$(set +u; nxf_mktemp /dev/shm 2>/dev/null || nxf_mktemp $TMPDIR)
    local cout=$ctmp/.command.out; mkfifo $cout
    local cerr=$ctmp/.command.err; mkfifo $cerr
    tee .command.out < $cout &
    tee1=$!
    tee .command.err < $cerr >&2 &
    tee2=$!
    ( nxf_launch ) >$cout 2>$cerr &
    pid=$!
    wait $pid || nxf_main_ret=$?
    wait $tee1 $tee2
    nxf_unstage
}

When errexit is enabled (set -e), any command that returns a non-zero exit status immediately terminates the script. So by using set +e, we are explicitly disabling this behavior. This means that .command.out and .command.err may not necessarily be created despite the Docker container being run (via nxf_launch).

So I wonder if there is a problem with the size of /dev/shm? Google Cloud Batch supports the containerOptions directive<sup>1</sup>, so in the first instance, you might like to try bumping the shm-size<sup>2</sup> using something like this to your process definition:

process NLREXPRESS {

    container &#39;dthorbur1990/nlrexpress:latest&#39;
    containerOptions &#39;--shm-size 2g&#39;

    ...
}

Note the typo when setting the container directive. Not sure if it will cause problems, but the = character should be avoided here. The assignment syntax using the = character is indeed required inside your nextflow.config for example.

huangapple
  • 本文由 发表于 2023年6月16日 00:14:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76483616.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定