GNU Parallel未能提升性能。

huangapple go评论64阅读模式
英文:

GNU Parallel not improving performance

问题

I have the following script which zip archives a series of files/directories in the current directory in parallel and stores the archives into a temp directory.


# cd into a directory

tmp_directory="/tmp/test-1";

mkdir "${tmp_directory}" && \
find . -mindepth 1 -maxdepth 1 | \
parallel -j $(parallel --number-of-cores) zip -r ${tmp_directory}/{/.}.zip {} && \
zipmerge -k ../test-1.zip $tmp_directory

# a zip archive called test-1.zip should be generated at ../ #relative to the current directory

Note that to run the above code, GNU parallel and zipmerge must be installed.

The problem is that the sequential speed, before using parallel when zipping the directory is faster than after. The process is definitely not IO bound- checking h(top), the CPU is at 100% and the read/write speeds are ver low i.e. 10MB/s. I'm running this on an 8-core machine.

One thing is that when I first start the script, the CPU on 4 cores jumps to 100% and then drops rapidly until only one core is fully utilized.

I have no idea why it's not speeding up.

EDIT: iostat output (M2 MacBook Air)

disk0 
    KB/t  tps  MB/s 
   25.62    4  0.10 
    8.00    8  0.06 
    8.00    8  0.06 
    0.00    0  0.00 
    0.00    0  0.00 
    0.00    0  0.00 
   84.55 1323 109.22 
  727.63  117 83.47 
  228.24  249 55.44 
  955.80   60 55.74 
  927.55   53 47.76 
  870.12   66 55.97 
  925.10   62 55.85 
  398.56  171 66.60 
  241.23  362 85.31 
  428.89 1925 806.33 
  180.50  765 134.81 
    4.46   95  0.41 
    6.13   15  0.09 

Any help would be appreciated.

英文:

I have the following script which zip archives a series of files/directories in the current directory in parallel and stores the archives into a temp directory.


# cd into a directory

tmp_directory="/tmp/test-1"

mkdir "${tmp_directory}" && \
find . -mindepth 1 -maxdepth 1 | \
parallel -j $(parallel --number-of-cores) zip -r ${tmp_directory}/{/.}.zip {} && \
zipmerge -k ../test-1.zip $tmp_directory

# a zip archive called test-1.zip should be generated at ../ #relative to the current directory

Note that to run the above code, GNU parallel and zipmerge must be installed.

The problem is that the sequential speed, before using parallel when zipping the directory is faster than after. The process is definitely not IO bound- checking h(top), the CPU is at 100% and the read/write speeds are ver low i.e. 10MB/s. I'm running this on an 8-core machine.

One thing is that when I first start the script, the CPU on 4 cores jumps to 100% and then drops rapidly until only one core is fully utilized.

I have no idea why it's not speeding up.

EDIT: iostat output (M2 MacBook Air)

disk0 
    KB/t  tps  MB/s 
   25.62    4  0.10 
    8.00    8  0.06 
    8.00    8  0.06 
    0.00    0  0.00 
    0.00    0  0.00 
    0.00    0  0.00 
   84.55 1323 109.22 
  727.63  117 83.47 
  228.24  249 55.44 
  955.80   60 55.74 
  927.55   53 47.76 
  870.12   66 55.97 
  925.10   62 55.85 
  398.56  171 66.60 
  241.23  362 85.31 
  428.89 1925 806.33 
  180.50  765 134.81 
    4.46   95  0.41 
    6.13   15  0.09 

Any help would be appreciated.

答案1

得分: 1

这听起来是I/O受限的情况。

iostat -dkx 1

非常适合查看磁盘的繁忙程度。尽管我很喜欢top,但对于这个问题它无用。

您的问题可能不是带宽,而是寻址速度。

在磁盘驱动器上,按顺序读取通常比并行读取快。NVMe驱动器通常有并行化的最佳点(通常是4或8)。RAID和网络驱动器通常也有最佳点,但这取决于驱动器。

附注:parallel -j $(parallel --number-of-cores)parallel --use-cores-instead-of-threads相同。

英文:

This sounds I/O bound.

iostat -dkx 1

is excellent for seeing how busy a disk is. However much I love top, it is useless for this.

Your problem is likely not bandwidth, but seeking.

On magnetic drives it will often be faster to read sequentially than in parallel. NVMe drives typically have a sweet spot of parallelization (often 4 or 8). RAID and network drives will often also have a sweet spot, but it depends on the drive.

PS: parallel -j $(parallel --number-of-cores) is the same as parallel --use-cores-instead-of-threads

huangapple
  • 本文由 发表于 2023年7月6日 20:39:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76628934.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定