英文:
Runtime discrepancy when using CompletableFuture on an IO bound task
问题
My understanding of the JVM multi-threading model is that when a thread executes an IO call, the thread is BLOCKED
and put into a waiting queue by the JVM/OS until data is available.
I am trying to emulate this behavior in my code and running a benchmark with various thread sizes, using JMH
and CompletableFuture
.
However, the results are not what I expected. I was expecting a constant execution time (with thread/context switching overhead) irrespective of the number of threads (with memory limitations), since the tasks are IO bound and not CPU bound.
My cpu is a 4 core/ 8 thread laptop processor, and even with 1 or 2 threads, there is a discrepancy in the expected behavior.
I'm trying to read a 5MB file (separate file for each thread) in the async task. At the start of each iteration, I create a FixedThreadPool
with the required number of threads.
@Benchmark
public void readAsyncIO(Blackhole blackhole) throws ExecutionException, InterruptedException {
List<CompletableFuture<Void>> readers = new ArrayList<>();
for (int i =0; i< threadSize; i++) {
int finalI = i;
readers.add(CompletableFuture.runAsync(() -> readFile(finalI), threadPool));
}
Object result = CompletableFuture
.allOf(readers.toArray(new CompletableFuture[0]))
.get();
blackhole.consume(result);
}
@Setup(Level.Iteration)
public void setup() throws IOException {
threadPool = Executors.newFixedThreadPool(threadSize);
}
@TearDown(Level.Iteration)
public void tearDown() {
threadPool.shutdownNow();
}
public byte[] readFile(int i) {
try {
File file = new File(filePath + "/" + fileName + i);
byte[] bytesRead = new byte[(int)file.length()];
InputStream inputStream = new FileInputStream(file);
inputStream.read(bytesRead);
return bytesRead;
} catch (Exception e) {
throw new CompletionException(e);
}
}
And the JMH config,
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Warmup(iterations = 3)
@Fork(value=1)
@Measurement(iterations = 3)
public class SimpleTest {
@Param({ "1", "2", "4", "8", "16", "32", "50", "100" })
public int threadSize;
.....
}
Any idea on what I'm doing wrong? Or are my assumptions incorrect?
英文:
My understanding of the JVM multi-threading model is that when a thread executes an IO call, the thread is BLOCKED
and put into a waiting queue by the JVM/OS until data is available.
I am trying to emulate this behavior in my code and running a benchmark with various thread sizes, using JMH
and CompletableFuture
.
However, the results are not what I expected. I was expecting a constant execution time (with thread/context switching overhead) irrespective of the number of threads (with memory limitations), since the tasks are IO bound and not CPU bound.
My cpu is a 4 core/ 8 thread laptop processor, and even with 1 or 2 threads, there is a discrepancy in the expected behavior.
I'm trying to read a 5MB file (separate file for each thread) in the async task. At the start of each iteration, I create a FixedThreadPool
with the required number of threads.
@Benchmark
public void readAsyncIO(Blackhole blackhole) throws ExecutionException, InterruptedException {
List<CompletableFuture<Void>> readers = new ArrayList<>();
for (int i =0; i< threadSize; i++) {
int finalI = i;
readers.add(CompletableFuture.runAsync(() -> readFile(finalI), threadPool));
}
Object result = CompletableFuture
.allOf(readers.toArray(new CompletableFuture[0]))
.get();
blackhole.consume(result);
}
@Setup(Level.Iteration)
public void setup() throws IOException {
threadPool = Executors.newFixedThreadPool(threadSize);
}
@TearDown(Level.Iteration)
public void tearDown() {
threadPool.shutdownNow();
}
public byte[] readFile(int i) {
try {
File file = new File(filePath + "/" + fileName + i);
byte[] bytesRead = new byte[(int)file.length()];
InputStream inputStream = new FileInputStream(file);
inputStream.read(bytesRead);
return bytesRead;
} catch (Exception e) {
throw new CompletionException(e);
}
}
And the JMH config,
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Warmup(iterations = 3)
@Fork(value=1)
@Measurement(iterations = 3)
public class SimpleTest {
@Param({ "1", "2", "4", "8", "16", "32", "50", "100" })
public int threadSize;
.....
}
Any idea on what I'm doing wrong ? Or are my assumptions incorrect ?
答案1
得分: 1
这似乎是合理的。使用单线程,您可以看到处理一个文件大约需要2毫秒,增加更多线程会导致每个线程的平均时间更长,因为在非常大的文件上的每个read(bytesRead)
操作可能会进行多次磁盘读取,因此可能存在I/O阻塞和线程上下文切换的机会,而且还取决于磁盘的情况,可能还会增加寻道时间。
英文:
It seems reasonable. With single thread you see that 1 file takes ~ 2ms to deal with, adding more threads would lead to longer average per thread because each read(bytesRead)
on very large size is likely to do multiple disk reads so there may be opportunity for IO blocking and thread context switching, plus - depending on the disks - more seek times.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论