英文:
Slow responses using Using Google Cloud Run, FastAPI and the Meta Whatsapp API
问题
这是一个相当具体的问题,但我想知道是否有其他人遇到过这个问题。我正在使用 Whatsapp Cloud API(https://developers.facebook.com/docs/whatsapp/cloud-api/)来创建一个问答聊天机器人。这些消息由一个需要一些时间来响应的LLM接收。
不幸的是,在此期间,Meta API多次发送了相同的消息给我。似乎除非立即回复一个200状态码,否则Meta API会不断地向您发送相同的消息(请参阅此处下的“重试”部分以及先前的stackoverflow答案:https://stackoverflow.com/questions/72894209/whatsapp-cloud-api-sending-old-message-inbound-notification-multiple-time-on-my)。
我尝试过的方法
我的第一种方法是使用FastAPI的后台任务功能。这允许我立即返回一个200响应,然后将LLM的工作作为后台进程执行。这在停止多个Whatsapp API调用方面效果很好。然而,LLM的响应非常慢,因为云运行可能不会看到后台任务,因此会关闭。
我不希望尝试的方法
我知道您可以将云运行设置为“始终运行”,将最小CPU设置为1。这可能会解决后台任务的问题,但我不想在不确定它将得到多少使用的情况下支付一直开启的服务器费用。这也有点违背了云运行的初衷。
我也可以创建两个微服务,一个用于接收Whatsapp消息并立即确认接收,另一个则会接收每条消息并执行LLM工作。我希望尽量避免这种方式,因为它是一个相对简单的代码库,不想拆分成两个服务。
那么......
有没有办法将这个作为单个云运行服务运行,同时解决我提到的问题?
英文:
This is quite a sepcific problem but I'm wondering if anyone else has encountered it. I'm using the Whatsapp Cloud API (https://developers.facebook.com/docs/whatsapp/cloud-api/) for a question-answer chatbot. These messages are received by an LLM which takes some time to respond .
Unfortunately in the meantime, the Meta API has sent me the same message a few more times. It seems like unless you almost immediately respond with a 200 status code the Meta API will keep spamming you with the same message (see here under the "Retry" heading and previous stackoverflow answer: https://stackoverflow.com/questions/72894209/whatsapp-cloud-api-sending-old-message-inbound-notification-multiple-time-on-my).
What I've tried
My first approach was to use FastAPI's background task functionality. This allows me to immediately return a 200 response and then do the LLM stuff as a background process. This works well in as much as it stops the multiple Whatsapp API calls. However, the LLM is very slow to respond because cloud run presumably does not see the background task and therefore shuts down.
What I would prefer not to try
I know you can set cloud run to be "always on", setting the min CPUs to 1. That would presumably solve the background task problem, but I don't want to pay for a server that's constantly on when I'm not sure how much use it will get. It also kind of defeats the object of cloud run.
I could also have 2 microservices, one to receive the Whatsapp messages and immediately acknowledge receipt, the other would then receive each message and do the LLM stuff. I want to try and avoid this as it's a relatively simple codebase and would prefer not to split out into 2 services.
So.....
Is there any way to have this running as a single service on Cloud Run, while solving the problems I mentioned?
答案1
得分: 1
这里是翻译好的内容:
要回答我的问题... 有一个设置,可以在容器处于活动状态时始终分配CPU(最多15分钟)。请参阅此处:https://cloud.google.com/blog/products/serverless/cloud-run-gets-always-on-cpu-allocation
我之前没有意识到这是一个不同的设置,与最小CPU实例不同,但这仍意味着在不活动时容器将被终止。
英文:
To answer my own question.. There is a setting to have CPU always allocated while the container is active (max 15 minutes). See here: https://cloud.google.com/blog/products/serverless/cloud-run-gets-always-on-cpu-allocation
I hadn't realised that this is a different setting to the minimum CPU instances, but it still means the container will be terminated when inactive.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论