英文:
How many Futures is too many in Java?
问题
当试图确定在Java中的数据处理服务器中如何分解任务时,我需要知道ExecutorService中有太多的Future是不合适的。
据我理解,具有一组重型线程池的ExecutorService像处理绿色线程一样处理Future,这意味着在Future之间执行上下文切换的成本非常小。这是真的吗?
我应该向ExecutorService提交数百万个Future吗(在池中使用固定数量的线程)?
如果我期望将许多生命周期非常短的Future(10毫秒)提交到ExecutorService,是否可以在不会看到严重性能降低的情况下执行?
英文:
When trying to determine how the tasks should break down in a data processing server in Java, I need to know how many Futures is too many for ExecutorService.
To my understanding, ExecutorServices with a pool of heavyweight threads, handles Futures like they are green thread, meaning the cost to perform a context switch between Futures is very small. Is this true?
Should I submit millions of Futures to ExecutorService (using fixed number of threads in the pool)?
Can I expect to submit many very-short-lived Futures (10 ms) into Executor service without seeing severe performance degradation?
答案1
得分: 2
你正在混淆一个代表异步操作可能结果的 Future
,与一个代表能够对 Callable
(至少在 Executor
的情况下是如此)执行处理的 Thread
。
没有任何阻止你在线程池上调用 submit
成千上万次,并获得大量需等待的 Future
对象。甚至如果应用程序将继续运行并且您不需要处理结果,您甚至不需要等待它们完成。
但是。
如果您创建了所有这些任务,它们将需要内存来保存其状态。如果该内存在某种程度上是作为作业的输入的一部分,或者是执行作业的结果,那么您将为所有这些任务分配堆空间。您无法永远这样做。基本上,如果您将大量工作放入要在后台运行的进程中,您需要考虑一种限制策略。
英文:
You're conflating a Future
, which represents the possible result of an asynchronous operation with a Thread
which represents the ability to perform processing on a Callable
(in the case of an Executor
at least).
There's nothing to stop you calling submit
on a thread pool millions of times and get a huge list of Future
objects for you to wait on. You don't even need to wait for them to finish if the application will continue running and you have no need to process the result.
But.
If you create all these jobs, they are going to require memory to hold their state. If that memory is somehow part of the input to the job, or the result of executing the job, then you will commit heap space to all these tasks. You can't do this forever. Essentially, you need to think of some sort of throttling, if you're going to pull huge amount of work into a process to run in the background.
答案2
得分: 0
根据我的理解,具有一组重量级线程池的ExecutorServices会处理Futures,就好像它们是绿色线程。
这并不正确。如果我们忽略一切炫耀的东西,ExecutorService由一组工作线程和一个任务阻塞队列组成。队列中的每个任务都是一个包装器,包含您的一个任务和一个Future
对象。
每个工作线程都会循环执行以下操作:
- 从队列中选择一个任务,
- 调用您的
Runnable
或Callable
对象的run()
或call(...)
方法, - 使用您的方法返回的值或由您的方法引发的异常完成
Future
。 - 然后等待另一个任务。
唯一的线程是“重量级”的工作线程。一旦其中一个工作线程开始处理任务,它就不会做其他任何事情,直到任务完成。尚未启动的任务只是队列中的对象,Executor在Future完成后就会忘记每个任务和Future对象。这些在您自己的代码丢弃对它们的引用后将不再存在。
英文:
> To my understanding, ExecutorServices with a pool of heavyweight threads, handles Futures like they are green thread
That's not correct. If we ignore the bells and whistles, an ExecutorService consists of a collection of worker threads, and a blocking queue of tasks. Each task in the queue is a wrapper containing one of your tasks, and a Future
.
Each worker thread loops forever,
- Picks a task from the queue,
- Calls your
Runnable
orCallable
object'srun()
orcall(...)
method, - Completes the
Future
with the value returned by your method or, with an exception that was thrown by your method. - Goes back to wait for another task.
The only threads are the "heavy weight" worker threads. Once one of the worker threads starts to work on a task, it won't do anything else until the task is complete. Tasks that haven't yet been started are just objects in a queue, and the Executor forgets about each task and Future object as soon as the Future is completed. Those won't continue exist after your own code has discarded the references to them.
答案3
得分: 0
你可以这样做,但是你应该评估可能的时间开销。处理单独的Future对象的开销很小,但不为零。因此,任务数量越少,越好。另一方面,当任务数变少时,处理器数量也变少(即相对于超线程的处理器核心数),则并行级别降低,整体执行时间增加。
假设你有100万个耗时10毫秒的任务,而你的计算机有8个核心。那么由于在末尾减少了并行性,总执行时间增加了(10 * 8 / 2) = 40毫秒,加上125毫秒用于任务切换(我评估每次任务切换只有1微秒)。如果你有10万个耗时100毫秒的任务,那么预计执行时间仍为1250秒,再加上400毫秒用于末尾处理和12.5毫秒用于切换。无论哪种方式,时间开销都是微不足道的,但如果你的任务明显比10到100毫秒的区间要短或者长,开销可能会增加。
英文:
Should I submit millions of Futures to ExecutorService?
you can, but you should evaluate possible time overhead. The overhead of handling separate Future object is small, but greater than zero. So the less number of tasks, the better. On the other hand, when the number of tasks becomes less then the number of processors (that is, the number of processor cores with respect to hyperthreading), then the level of parallelism is reduced and the overall execution time increases.
Let you have 1 million of 10-ms tasks, and your computer have 8 cores. Then the overall execution time of 1250 sec is increased by (10*8/2) = 40 ms because of decreasing parallelism at the end, and plus 125 ms for task switch (I evaluate it as little as 1 useq for each task switch). if your have 100000 100-ms tasks, then the execution time is still expected to be 1250 sec, plus 400 ms for tail and 12.5 ms for switch. Either way, the time overhead is neglidgible, but it can increase if your tasks are significantly shorter or longer than 10...100 interval.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论