2023年4月19日 22:47:08go评论105阅读模式

英文:

fastAPI background task takes up to 100 times longer to execute than calling function directly

问题

我在Google Cloud Run上部署了一个简单的fastAPI端点。我自己编写了Workflow类。执行Workflow实例时，会发生一些步骤，例如处理文件并将结果放入矢量存储数据库。

通常，每个文件需要几秒钟，例如这样：

from .workflow import Workflow
...
@app.post('/execute_workflow_directly')
async def execute_workflow_directly(request: Request):
    ...  # 从请求对象中获取文件
    workflow = Workflow.get_simple_workflow(files=files)
    workflow.execute()
    return JSONResponse(status_code=200, content={'message': '成功处理文件'})

现在，如果涉及许多文件，可能会花费一些时间，我不想让端点的调用者等待，所以我想在后台运行工作流程执行，就像这样：

from .workflow import Workflow
from fastapi import BackgroundTasks
...
def run_workflow_in_background(workflow: Workflow):
    workflow.execute()
@app.post('/execute_workflow_in_background')
async def execute_workflow_in_background(request: Request, background_tasks: BackgroundTasks):
    ...  # 从请求对象中获取文件
    workflow = Workflow.get_simple_workflow(files=files)
    background_tasks.add_task(run_workflow_in_background, workflow)
    return JSONResponse(status_code=202, content={'message': '文件处理已启动'})

使用仍然只有一个文件进行测试时，我已经遇到了一个问题：在本地，它运行得很好，但当我将其部署到我的Google Cloud Run服务时，执行时间激增：在一个例子中，后台执行花了近500秒，直到我在数据库中看到结果，而直接执行工作流则为约5秒。

我已经尝试将CPU核心数量增加到4，随后将gunicorn工作程序数量增加到4。不确定这是否有太多意义，但它并没有减少执行时间。

我能否通过某种方式为Google Cloud Run分配更多资源来解决这个问题，或者我的方法有缺陷，我做错了什么，或者应该立即切换到更复杂的东西，比如Celery？

英文:

I have simple fastAPI endpoint deployed on Google Cloud Run. I wrote the Workflow class myself. When the Workflow instance is executed, some steps happen, e.g., the files are processed and the result are put in a vectorstore database.

Usually, this takes a few seconds per file like this:

from .workflow import Workflow
...
@app.post(&#39;/execute_workflow_directly&#39;)
async def execute_workflow_directly(request: Request)
    ...  # get files from request object
    workflow = Workflow.get_simple_workflow(files=files)
    workflow.execute()
    return JSONResponse(status_code=200, content={&#39;message&#39;: &#39;Successfully processed files&#39;})

Now, if many files are involved, this might take a while, and I don't want to let the caller of the endpoint wait, so I want to run the workflow execution in the background like this:

from .workflow import Workflow
from fastapi import BackgroundTasks
...
def run_workflow_in_background(workflow: Workflow):
    workflow.execute()
@app.post(&#39;/execute_workflow_in_background&#39;)
async def execute_workflow_in_background(request: Request, background_tasks: BackgroundTasks):
    ...  # get files from request object
    workflow = Workflow.get_simple_workflow(files=files)
    background_tasks.add_task(run_workflow_in_background, workflow)
    return JSONResponse(status_code=202, content={&#39;message&#39;: &#39;File processing started&#39;})

Testing this with still only one file, I already run into a problem: Locally, it works fine, but when I deploy it to my Google Cloud Run service, execution time goes through the roof: In one example, background execution it took almost ~500s until I saw the result in the database, compared to ~5s when executing the workflow directly.

I already tried to increase the number of CPU cores to 4 and subsequently the number of gunicorn workers to 4 as well. Not sure if that makes much sense, but it did not decrease the execution times.

Can I solve this problem by allocating more resources to Google Cloud run somehow or is my approach flawed and I'm doing something wrong or should already switch to something more sophisticated like Celery?

Edit (not really relevant to the problem I had, see accepted answer):

I read the accepted answer to this question and it helped clarify some things, but doesn't really answer my question why there is such a big difference in execution time between running directly vs. as a background task. Both versions call the CPU-intensive workflow.execute() asynchronously if I'm not mistaken.

I can't really change the endpoint's definition to def, because I am awaiting other code inside.

I tried changing the background function to

async def run_workflow_in_background(workflow: Workflow):
    await run_in_threadpool(workflow.execute)

and

async def run_workflow_in_background(workflow: Workflow):
    loop = asyncio.get_running_loop()
    with concurrent.futures.ThreadPoolExecutor() as pool:
        res = await loop.run_in_executor(pool, workflow.execute)

and

async def run_workflow_in_background(workflow: Workflow):
    res = await asyncio.to_thread(workflow.execute)

and

async def run_workflow_in_background(workflow: Workflow):
    loop = asyncio.get_running_loop()
    with concurrent.futures.ProcessPoolExecutor() as pool:
        res = await loop.run_in_executor(pool, workflow.execute)

as suggested and it didn't help.

I tried increasing the number of workers as suggested and it didn't help.

I guess I will look into switching to Celery, but still eager to understand why it works so slowly with fastAPI background tasks.

答案1

得分: 2

使用Cloud Function，类似于Cloud Run，只有在处理请求时才分配（并计费）CPU。

请求被认为在接收请求和发送响应之间被处理。

其余时间，CPU会被限制在5%以下。

话虽如此，让我们回顾一下您的函数。

最快的函数获取数据，处理数据，并发送响应。在处理过程中，CPU会全时段分配。
最慢的函数获取数据，在后台运行任务（多线程、分支或其他方式），然后立即发送响应。在响应发送后，CPU会被限制，处理开始。当然，这非常慢，超出了CPU分配限制。

为了解决这个问题，您可以使用Cloud Run选项，CPU Always allocated（或使用GCLOUD命令行的no-cpu-throttling）。在Cloud Functions中没有此选项。

英文:

With Cloud Function, like Cloud Run, the CPU is allocated (and billed) only when a request is processed.

A request is considered being processed between the reception of the request and the sending of the response.

The rest of the time, the CPU is throttled (below 5%).

That's being said, look back to your functions.

The fastest one get the data, process the data, and send the response. The CPU is allocated full time during the processing.
The slowest one get the data, run a task in background (multi thread, fork or whatever) and send the response immediately. After the response sent, the CPU is throttled, and the processing begin. Of course, it is very slow, you are out of the CPU allocation boundaries.

To solve that, you can use Cloud Run with the option, CPU Always allocated (or no-cpu-throttling with the GCLOUD command line). There is no option with Cloud Functions

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

fastAPI后台任务的执行时间可能比直接调用函数长100倍。

问题

答案1

我可以使用react-native开发地图应用吗？

I want to pass in 2 things for a function but one is in another function and the function which I need the return is called in a lambda. What do I do?

How do I make a for loop that has range in it loop back to the top of the loop without changing the range variable? I have code

将一个列的特定部分值添加到另一个列

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。