Google App Engine应用部署失败,尽管readiness_check返回200状态响应。

huangapple go评论128阅读模式
英文:

Google App Engine application deployment fails despite readiness_check returning a 200 status response

问题

I'm trying to setup a readiness_check for my application. Here's the related section of my app.yaml:

readiness_check:
  path: '/readiness_check'
  check_interval_sec: 30
  timeout_sec: 4
  failure_threshold: 10
  success_threshold: 1
  app_start_timeout_sec: 300

The project I'm developing is a Node.js application working on Express. Here's how I handle the /readiness_check endpoint:

app
  .get(['/readiness_check'], (req, res) => res.sendStatus(200))

Without readiness_check configured, my deployment process succeeds and I can access my application without any problems. However, when I include the readiness_check, the process fails with the following error:

OperationError: Error Response: 4 Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.

I checked the logs and I could see that /readiness_check returned 502 at first (while the application was still starting) and then started to return 200 status codes. Accessing the endpoint with curl manually showed the same results. But still for some reason GCP didn't see my deployment as healthy.

Running gcloud app describe confirms that I have splitHealthChecks feature enabled.

I walked through troubleshooting sections in the documentation and discovered that I didn't have servicecontrol.googleapis.com and endpoints.googleapis.com services enabled, so I enabled them, but that didn't help either.

I also saw the following note in the documentation:

If you examine the nginx.health_check logs for your application, you might see health check polling happening more frequently than you have configured, due to the redundant health checkers that are also following your settings. These redundant health checkers are created automatically and you cannot configure them.

It might be an unrelated question, but I couldn't find nginx.health_check in my application logs. I tried to search for "nginx" text, but I didn't see anything related to health checking. Though when looking for "readiness_check" it indeed showed me the responses that I've mentioned above.

英文:

I'm trying to setup a readiness_check for my application. Here's the related section of my app.yaml:

readiness_check:
  path: '/readiness_check'
  check_interval_sec: 30
  timeout_sec: 4
  failure_threshold: 10
  success_threshold: 1
  app_start_timeout_sec: 300

(Full config)

The project I'm developing is a Node.js application working on Express. Here's how I handle the /readiness_check endpoint:

app
  .get(['/readiness_check'], (req, res) => res.sendStatus(200))

Without readiness_check configured, my deployment process succeeds and I can access my application without any problems. However, when I include the readiness_check, the process fails with the following error:
> OperationError: Error Response: 4 Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.

I checked the logs and I could see that /readiness_check returned 502 at first (while the application was still starting) and then started to return 200 status codes. Accessing the endpoint with curl manually showed the same results. But still for some reason GCP didn't see my deployment as healthy.

Running gcloud app describe confirms that I have splitHealthChecks feature enabled.

I walked through troubleshooting sections in the documentation and discovered that I didn't have servicecontrol.googleapis.com and endpoints.googleapis.com services enabled, so I enabled them, but that didn't help either.

I also saw the following note in the documentation:
> If you examine the nginx.health_check logs for your application, you might see health check polling happening more frequently than you have configured, due to the redundant health checkers that are also following your settings. These redundant health checkers are created automatically and you cannot configure them.

It might be an unrelated question, but I couldn't find nginx.health_check in my application logs. I tried to search for "nginx" text, but I didn't see anything related to health checking. Though when looking for "readiness_check" it indeed showed me the responses that I've mentioned above.

答案1

得分: 3

以下是翻译好的部分:

  1. 你可以增加 app.yaml 文件中 resources 部分中的值。你可以在这里了解更多信息。

  2. 你可以增加 app_start_timeout_sec 的值到最大值,即 1800 秒。这样你可以给你的应用更多时间来变得健康。

  3. 即使运行 gcloud app describe 确认你已启用了 splitHealthChecks 功能,你是否按照迁移健康检查的所有正确步骤来进行迁移?它是否应用于你应用的所有版本,包括旧版本?你可以仔细检查需要执行的所有步骤,以将健康检查转换为这里。执行命令 gcloud app update --split-health-checks --project [YOUR_PROJECT_ID] 可能不足够。

编辑:
尽管从理论上讲,如果你没有将流量分割到不同的版本,这不应该是一个问题(我无法想出为什么会有问题的原因)。在迁移文档的第二步中,它说:

> 为应用程序的每个版本转换传统健康检查选项。

为了做到这一点,你应该为每个版本编写并更新相应的 app.yaml,然后部署服务,如分配给特定版本 ID。例如:gcloud app deploy --project PROJECT_ID --version VERSION_ID --no-promote

  1. 作为一种解决方法,你可以“伪造” readiness_check 响应,在一段时间后返回 200 状态响应。你需要在这个部分中添加一个自定义处理程序。这样部署就不会超时,并且会在后台继续工作。然而,这会忽略了就绪检查的目的,因为你的实例可能在尚未准备好的情况下接收流量。如果你考虑到这一点,并且可以在你的应用程序中处理这个问题,这可能是一个可以考虑的选项。

最后,我假设你正在使用 App Engine Flex,因为标准版本不提供健康检查并且会出现错误。你可以在这里的讨论中查看更多信息。

英文:

There could be more possible ways to fix this:

  1. You can increase the values in the resources rubric within your app.yaml file. You can check more about this here.

  2. You can increase the value of app_start_timeout_sec to the maximum value, which is 1800. That way you can give a bit more time to your app to become healthy.

  3. Even though, running gcloud app describe confirms that you have splitHealthChecks feature enabled, did you do the all the proper steps in migrating the health check from the legacy version? Is it applied to all the versions of your app, even the old ones?
    You can check carefully all the steps that needed to be taken to convert the health checks here. Applying the command gcloud app update --split-health-checks --project [YOUR_PROJECT_ID] may not be enough.

EDIT:
Even though, theoretically, if you did not split your traffic across different versions, this should not be a problem ( I cannot think for a reason why it would be ). In the documentation about migration, in the step 2, it says that :

> Convert legacy health check options for each version in your application.

In order to do that, you should write and update the app.yaml accordingly for each version and, then deploy the service, as assigned to a certain version ID. eg : gcloud app deploy --project PROJECT_ID --version VERSION_ID --no-promote

  1. As a workaround, you could "fake" the readiness_check response, to give a 200 status response after a certain time. You would have to add a custom handler in this section. This way the deployment won't time out, and will keep working in the background. However, this misses the purpose of the readiness checks, as your instance might receive traffic when it's not ready to do so. If you have this in mind, and can handle this in your application, it would be an option to consider.

Finally, I suppose you are using App Engine Flex, as for the Standard version the healthchecks are not available and there will be errors. You can check this discussion here.

huangapple
  • 本文由 发表于 2020年1月3日 23:04:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/59580867.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定