英文:
Google App Engine application deployment fails despite readiness_check returning a 200 status response
问题
I'm trying to setup a readiness_check
for my application. Here's the related section of my app.yaml
:
readiness_check:
path: '/readiness_check'
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 10
success_threshold: 1
app_start_timeout_sec: 300
The project I'm developing is a Node.js application working on Express. Here's how I handle the /readiness_check
endpoint:
app
.get(['/readiness_check'], (req, res) => res.sendStatus(200))
Without readiness_check
configured, my deployment process succeeds and I can access my application without any problems. However, when I include the readiness_check
, the process fails with the following error:
OperationError: Error Response: 4 Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.
I checked the logs and I could see that /readiness_check
returned 502
at first (while the application was still starting) and then started to return 200
status codes. Accessing the endpoint with curl
manually showed the same results. But still for some reason GCP didn't see my deployment as healthy.
Running gcloud app describe
confirms that I have splitHealthChecks
feature enabled.
I walked through troubleshooting sections in the documentation and discovered that I didn't have servicecontrol.googleapis.com
and endpoints.googleapis.com
services enabled, so I enabled them, but that didn't help either.
I also saw the following note in the documentation:
If you examine the nginx.health_check logs for your application, you might see health check polling happening more frequently than you have configured, due to the redundant health checkers that are also following your settings. These redundant health checkers are created automatically and you cannot configure them.
It might be an unrelated question, but I couldn't find nginx.health_check
in my application logs. I tried to search for "nginx" text, but I didn't see anything related to health checking. Though when looking for "readiness_check" it indeed showed me the responses that I've mentioned above.
英文:
I'm trying to setup a readiness_check
for my application. Here's the related section of my app.yaml
:
readiness_check:
path: '/readiness_check'
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 10
success_threshold: 1
app_start_timeout_sec: 300
The project I'm developing is a Node.js application working on Express. Here's how I handle the /readiness_check
endpoint:
app
.get(['/readiness_check'], (req, res) => res.sendStatus(200))
Without readiness_check
configured, my deployment process succeeds and I can access my application without any problems. However, when I include the readiness_check
, the process fails with the following error:
> OperationError: Error Response: 4 Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.
I checked the logs and I could see that /readiness_check
returned 502
at first (while the application was still starting) and then started to return 200
status codes. Accessing the endpoint with curl
manually showed the same results. But still for some reason GCP didn't see my deployment as healthy.
Running gcloud app describe
confirms that I have splitHealthChecks
feature enabled.
I walked through troubleshooting sections in the documentation and discovered that I didn't have servicecontrol.googleapis.com
and endpoints.googleapis.com
services enabled, so I enabled them, but that didn't help either.
I also saw the following note in the documentation:
> If you examine the nginx.health_check logs for your application, you might see health check polling happening more frequently than you have configured, due to the redundant health checkers that are also following your settings. These redundant health checkers are created automatically and you cannot configure them.
It might be an unrelated question, but I couldn't find nginx.health_check
in my application logs. I tried to search for "nginx" text, but I didn't see anything related to health checking. Though when looking for "readiness_check" it indeed showed me the responses that I've mentioned above.
答案1
得分: 3
以下是翻译好的部分:
-
你可以增加
app.yaml
文件中resources
部分中的值。你可以在这里了解更多信息。 -
你可以增加
app_start_timeout_sec
的值到最大值,即 1800 秒。这样你可以给你的应用更多时间来变得健康。 -
即使运行
gcloud app describe
确认你已启用了 splitHealthChecks 功能,你是否按照迁移健康检查的所有正确步骤来进行迁移?它是否应用于你应用的所有版本,包括旧版本?你可以仔细检查需要执行的所有步骤,以将健康检查转换为这里。执行命令gcloud app update --split-health-checks --project [YOUR_PROJECT_ID]
可能不足够。
编辑:
尽管从理论上讲,如果你没有将流量分割到不同的版本,这不应该是一个问题(我无法想出为什么会有问题的原因)。在迁移文档的第二步中,它说:
> 为应用程序的每个版本转换传统健康检查选项。
为了做到这一点,你应该为每个版本编写并更新相应的 app.yaml
,然后部署服务,如分配给特定版本 ID。例如:gcloud app deploy --project PROJECT_ID --version VERSION_ID --no-promote
。
- 作为一种解决方法,你可以“伪造”
readiness_check
响应,在一段时间后返回 200 状态响应。你需要在这个部分中添加一个自定义处理程序。这样部署就不会超时,并且会在后台继续工作。然而,这会忽略了就绪检查的目的,因为你的实例可能在尚未准备好的情况下接收流量。如果你考虑到这一点,并且可以在你的应用程序中处理这个问题,这可能是一个可以考虑的选项。
最后,我假设你正在使用 App Engine Flex,因为标准版本不提供健康检查并且会出现错误。你可以在这里的讨论中查看更多信息。
英文:
There could be more possible ways to fix this:
-
You can increase the values in the
resources
rubric within yourapp.yaml
file. You can check more about this here. -
You can increase the value of
app_start_timeout_sec
to the maximum value, which is 1800. That way you can give a bit more time to your app to become healthy. -
Even though, running
gcloud app describe
confirms that you have splitHealthChecks feature enabled, did you do the all the proper steps in migrating the health check from the legacy version? Is it applied to all the versions of your app, even the old ones?
You can check carefully all the steps that needed to be taken to convert the health checks here. Applying the commandgcloud app update --split-health-checks --project [YOUR_PROJECT_ID]
may not be enough.
EDIT:
Even though, theoretically, if you did not split your traffic across different versions, this should not be a problem ( I cannot think for a reason why it would be ). In the documentation about migration, in the step 2, it says that :
> Convert legacy health check options for each version in your application.
In order to do that, you should write and update the app.yaml
accordingly for each version and, then deploy the service, as assigned to a certain version ID. eg : gcloud app deploy --project PROJECT_ID --version VERSION_ID --no-promote
- As a workaround, you could "fake" the
readiness_check
response, to give a 200 status response after a certain time. You would have to add a custom handler in this section. This way the deployment won't time out, and will keep working in the background. However, this misses the purpose of the readiness checks, as your instance might receive traffic when it's not ready to do so. If you have this in mind, and can handle this in your application, it would be an option to consider.
Finally, I suppose you are using App Engine Flex, as for the Standard version the healthchecks are not available and there will be errors. You can check this discussion here.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论