英文:
GNU Parallel not improving performance
问题
I have the following script which zip archives a series of files/directories in the current directory in parallel and stores the archives into a temp directory.
# cd into a directory
tmp_directory="/tmp/test-1";
mkdir "${tmp_directory}" && \
find . -mindepth 1 -maxdepth 1 | \
parallel -j $(parallel --number-of-cores) zip -r ${tmp_directory}/{/.}.zip {} && \
zipmerge -k ../test-1.zip $tmp_directory
# a zip archive called test-1.zip should be generated at ../ #relative to the current directory
Note that to run the above code, GNU parallel and zipmerge must be installed.
The problem is that the sequential speed, before using parallel when zipping the directory is faster than after. The process is definitely not IO bound- checking h(top), the CPU is at 100% and the read/write speeds are ver low i.e. 10MB/s. I'm running this on an 8-core machine.
One thing is that when I first start the script, the CPU on 4 cores jumps to 100% and then drops rapidly until only one core is fully utilized.
I have no idea why it's not speeding up.
EDIT: iostat output (M2 MacBook Air)
disk0
KB/t tps MB/s
25.62 4 0.10
8.00 8 0.06
8.00 8 0.06
0.00 0 0.00
0.00 0 0.00
0.00 0 0.00
84.55 1323 109.22
727.63 117 83.47
228.24 249 55.44
955.80 60 55.74
927.55 53 47.76
870.12 66 55.97
925.10 62 55.85
398.56 171 66.60
241.23 362 85.31
428.89 1925 806.33
180.50 765 134.81
4.46 95 0.41
6.13 15 0.09
Any help would be appreciated.
英文:
I have the following script which zip archives a series of files/directories in the current directory in parallel and stores the archives into a temp directory.
# cd into a directory
tmp_directory="/tmp/test-1"
mkdir "${tmp_directory}" && \
find . -mindepth 1 -maxdepth 1 | \
parallel -j $(parallel --number-of-cores) zip -r ${tmp_directory}/{/.}.zip {} && \
zipmerge -k ../test-1.zip $tmp_directory
# a zip archive called test-1.zip should be generated at ../ #relative to the current directory
Note that to run the above code, GNU parallel and zipmerge must be installed.
The problem is that the sequential speed, before using parallel when zipping the directory is faster than after. The process is definitely not IO bound- checking h(top), the CPU is at 100% and the read/write speeds are ver low i.e. 10MB/s. I'm running this on an 8-core machine.
One thing is that when I first start the script, the CPU on 4 cores jumps to 100% and then drops rapidly until only one core is fully utilized.
I have no idea why it's not speeding up.
EDIT: iostat output (M2 MacBook Air)
disk0
KB/t tps MB/s
25.62 4 0.10
8.00 8 0.06
8.00 8 0.06
0.00 0 0.00
0.00 0 0.00
0.00 0 0.00
84.55 1323 109.22
727.63 117 83.47
228.24 249 55.44
955.80 60 55.74
927.55 53 47.76
870.12 66 55.97
925.10 62 55.85
398.56 171 66.60
241.23 362 85.31
428.89 1925 806.33
180.50 765 134.81
4.46 95 0.41
6.13 15 0.09
Any help would be appreciated.
答案1
得分: 1
这听起来是I/O受限的情况。
iostat -dkx 1
非常适合查看磁盘的繁忙程度。尽管我很喜欢top
,但对于这个问题它无用。
您的问题可能不是带宽,而是寻址速度。
在磁盘驱动器上,按顺序读取通常比并行读取快。NVMe驱动器通常有并行化的最佳点(通常是4或8)。RAID和网络驱动器通常也有最佳点,但这取决于驱动器。
附注:parallel -j $(parallel --number-of-cores)
与parallel --use-cores-instead-of-threads
相同。
英文:
This sounds I/O bound.
iostat -dkx 1
is excellent for seeing how busy a disk is. However much I love top
, it is useless for this.
Your problem is likely not bandwidth, but seeking.
On magnetic drives it will often be faster to read sequentially than in parallel. NVMe drives typically have a sweet spot of parallelization (often 4 or 8). RAID and network drives will often also have a sweet spot, but it depends on the drive.
PS: parallel -j $(parallel --number-of-cores)
is the same as parallel --use-cores-instead-of-threads
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论