我需要在dask-distributed中实现一个简单的FIFO调度。

huangapple go评论58阅读模式
英文:

I need a simple fifo scheduling in dask-distributed

问题

我有多个客户端作为服务器,一个调度器和一个带有3个线程的工作器。
我的客户端是异步的,当我收到请求时,它们使用分布式客户端。
调用看起来像这样:

processing_futures = await client.gather(client.compute(response, priority=100, 
                                                        resources={"GPU_RAM": 5}, #the worker has 16 gpu_ram
                                                        fifo_timeout='0ms'))

问题是,当我有很多任务(比如超过1k)时,工作器实际上会尝试首先处理所有的初始任务(如这里所述),这会导致工作器崩溃,而工作器可以通过简单的先进先出方式进行处理,这样就可以正常工作,问题是工作器试图将所有的中间结果存储在内存中,这太多了。这就像一个尴尬的并行问题,而dask似乎对此很有挑战性。

我尝试了很多方法,如何让它逐个处理任务?或者至少每次处理10个任务,而不是试图同时完成所有任务?

通过资源系统是否有解决的机会?我认为如果我将fifo_timeout设置为0,那么应该清楚地表明对于每个单独的请求,队列不应重新排序,请帮忙!

我需要在dask-distributed中实现一个简单的FIFO调度。

英文:

I have multiple clients as servers, one scheduler, and one worker with 3 threads.
My clients are async, and when I get a request they use the distributed client.
the call looks like that:

processing_futures = await client.gather(client.compute(response, priority=100, 
                                                        resources={"GPU_RAM": 5}, #the worker has 16 gpu_ram
                                                        fifo_timeout='0ms'))

well, the problem is, that when I get a lot of tasks (like more than 1k) the worker actually tries to process all the initial tasks (as described here) first, and this causes the worker to crash, instead the worker could have process by simple fifo, and it would have worked just fine, the problem is that the worker tries to store all the intermediate results in its memory, and it's way too much. It's like embarrassingly-parallel problem, and dask seem to be very challenged by that.

I tried many things, how can I make it process the tasks one by one? or at least like 10 by 10 but not trying to get it all done at the same time..

Is there a chance that it's solvable through the resource system? I thought that if I set the fifo_timeout to 0 then it should make it clear that for each separate request, the queue should not be reordered, please help!

我需要在dask-distributed中实现一个简单的FIFO调度。

答案1

得分: 1

只需要在服务器上更新我的dask版本,并将worker-saturation参数设置为默认值1.1。就这样,现在它可以正常工作了,谢谢!

我从这里获得了灵感。

英文:

Only had to update my dask version on the server, and have this parameter worker-saturation set to 1.1 which is default. that's it it works now thanks!

Took inspiration from here.

huangapple
  • 本文由 发表于 2023年8月9日 01:18:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76861860.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定