使用CompletableFuture在IO绑定任务上时运行时差异

huangapple go评论76阅读模式
英文:

Runtime discrepancy when using CompletableFuture on an IO bound task

问题

My understanding of the JVM multi-threading model is that when a thread executes an IO call, the thread is BLOCKED and put into a waiting queue by the JVM/OS until data is available.

I am trying to emulate this behavior in my code and running a benchmark with various thread sizes, using JMH and CompletableFuture.

However, the results are not what I expected. I was expecting a constant execution time (with thread/context switching overhead) irrespective of the number of threads (with memory limitations), since the tasks are IO bound and not CPU bound.

使用CompletableFuture在IO绑定任务上时运行时差异

My cpu is a 4 core/ 8 thread laptop processor, and even with 1 or 2 threads, there is a discrepancy in the expected behavior.

I'm trying to read a 5MB file (separate file for each thread) in the async task. At the start of each iteration, I create a FixedThreadPool with the required number of threads.

@Benchmark
public void readAsyncIO(Blackhole blackhole) throws ExecutionException, InterruptedException {
    List<CompletableFuture<Void>> readers = new ArrayList<>();

    for (int i =0; i< threadSize; i++) {
         int finalI = i;
         readers.add(CompletableFuture.runAsync(() -> readFile(finalI), threadPool));
    }

    Object result =  CompletableFuture
                     .allOf(readers.toArray(new CompletableFuture[0]))
                     .get();
    blackhole.consume(result);
}
@Setup(Level.Iteration)
public void setup() throws IOException {
    threadPool = Executors.newFixedThreadPool(threadSize);
}
@TearDown(Level.Iteration)
public void tearDown() {
    threadPool.shutdownNow();
}
public byte[] readFile(int i)  {
    try {
        File file = new File(filePath + "/" + fileName + i);
        byte[] bytesRead = new byte[(int)file.length()];
        InputStream inputStream = new FileInputStream(file);
        inputStream.read(bytesRead);
        return bytesRead;
    } catch (Exception e) {
        throw new CompletionException(e);
    }
}

And the JMH config,

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Warmup(iterations = 3)
@Fork(value=1)
@Measurement(iterations = 3)
public class SimpleTest {

    @Param({ "1", "2", "4", "8", "16", "32", "50", "100" })
    public int threadSize;
    .....
}

Any idea on what I'm doing wrong? Or are my assumptions incorrect?

英文:

My understanding of the JVM multi-threading model is that when a thread executes an IO call, the thread is BLOCKED and put into a waiting queue by the JVM/OS until data is available.

I am trying to emulate this behavior in my code and running a benchmark with various thread sizes, using JMH and CompletableFuture.

However, the results are not what I expected. I was expecting a constant execution time (with thread/context switching overhead) irrespective of the number of threads (with memory limitations), since the tasks are IO bound and not CPU bound.

使用CompletableFuture在IO绑定任务上时运行时差异

My cpu is a 4 core/ 8 thread laptop processor, and even with 1 or 2 threads, there is a discrepancy in the expected behavior.

I'm trying to read a 5MB file (separate file for each thread) in the async task. At the start of each iteration, I create a FixedThreadPool with the required number of threads.

@Benchmark
public void readAsyncIO(Blackhole blackhole) throws ExecutionException, InterruptedException {
    List&lt;CompletableFuture&lt;Void&gt;&gt; readers = new ArrayList&lt;&gt;();

    for (int i =0; i&lt; threadSize; i++) {
         int finalI = i;
         readers.add(CompletableFuture.runAsync(() -&gt; readFile(finalI), threadPool));
    }

    Object result =  CompletableFuture
                     .allOf(readers.toArray(new CompletableFuture[0]))
                     .get();
    blackhole.consume(result);
}
@Setup(Level.Iteration)
public void setup() throws IOException {
    threadPool = Executors.newFixedThreadPool(threadSize);
}
@TearDown(Level.Iteration)
public void tearDown() {
    threadPool.shutdownNow();
}
public byte[] readFile(int i)  {
    try {
        File file = new File(filePath + &quot;/&quot; + fileName + i);
        byte[] bytesRead = new byte[(int)file.length()];
        InputStream inputStream = new FileInputStream(file);
        inputStream.read(bytesRead);
        return bytesRead;
    } catch (Exception e) {
        throw new CompletionException(e);
    }
}

And the JMH config,

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Warmup(iterations = 3)
@Fork(value=1)
@Measurement(iterations = 3)
public class SimpleTest {

    @Param({ &quot;1&quot;, &quot;2&quot;, &quot;4&quot;, &quot;8&quot;, &quot;16&quot;, &quot;32&quot;, &quot;50&quot;, &quot;100&quot; })
    public int threadSize;
    .....

}

Any idea on what I'm doing wrong ? Or are my assumptions incorrect ?

答案1

得分: 1

这似乎是合理的。使用单线程,您可以看到处理一个文件大约需要2毫秒,增加更多线程会导致每个线程的平均时间更长,因为在非常大的文件上的每个read(bytesRead)操作可能会进行多次磁盘读取,因此可能存在I/O阻塞和线程上下文切换的机会,而且还取决于磁盘的情况,可能还会增加寻道时间。

英文:

It seems reasonable. With single thread you see that 1 file takes ~ 2ms to deal with, adding more threads would lead to longer average per thread because each read(bytesRead) on very large size is likely to do multiple disk reads so there may be opportunity for IO blocking and thread context switching, plus - depending on the disks - more seek times.

huangapple
  • 本文由 发表于 2020年8月11日 23:07:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/63361065.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定