2020年1月3日 23:04:47go评论135阅读模式

英文:

Google App Engine application deployment fails despite readiness_check returning a 200 status response

问题

I'm trying to setup a readiness_check for my application. Here's the related section of my app.yaml:

readiness_check:
  path: '/readiness_check'
  check_interval_sec: 30
  timeout_sec: 4
  failure_threshold: 10
  success_threshold: 1
  app_start_timeout_sec: 300

The project I'm developing is a Node.js application working on Express. Here's how I handle the /readiness_check endpoint:

app
  .get(['/readiness_check'], (req, res) => res.sendStatus(200))

OperationError: Error Response: 4 Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.

I checked the logs and I could see that /readiness_check returned 502 at first (while the application was still starting) and then started to return 200 status codes. Accessing the endpoint with curl manually showed the same results. But still for some reason GCP didn't see my deployment as healthy.

Running gcloud app describe confirms that I have splitHealthChecks feature enabled.

I walked through troubleshooting sections in the documentation and discovered that I didn't have servicecontrol.googleapis.com and endpoints.googleapis.com services enabled, so I enabled them, but that didn't help either.

I also saw the following note in the documentation:

If you examine the nginx.health_check logs for your application, you might see health check polling happening more frequently than you have configured, due to the redundant health checkers that are also following your settings. These redundant health checkers are created automatically and you cannot configure them.

It might be an unrelated question, but I couldn't find nginx.health_check in my application logs. I tried to search for "nginx" text, but I didn't see anything related to health checking. Though when looking for "readiness_check" it indeed showed me the responses that I've mentioned above.

英文:

I'm trying to setup a readiness_check for my application. Here's the related section of my app.yaml:

readiness_check:
  path: &#39;/readiness_check&#39;
  check_interval_sec: 30
  timeout_sec: 4
  failure_threshold: 10
  success_threshold: 1
  app_start_timeout_sec: 300

(Full config)

The project I'm developing is a Node.js application working on Express. Here's how I handle the /readiness_check endpoint:

app
  .get([&#39;/readiness_check&#39;], (req, res) =&gt; res.sendStatus(200))

Without readiness_check configured, my deployment process succeeds and I can access my application without any problems. However, when I include the readiness_check, the process fails with the following error:
> OperationError: Error Response: 4 Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.

Running gcloud app describe confirms that I have splitHealthChecks feature enabled.

I also saw the following note in the documentation:
> If you examine the nginx.health_check logs for your application, you might see health check polling happening more frequently than you have configured, due to the redundant health checkers that are also following your settings. These redundant health checkers are created automatically and you cannot configure them.

答案1

得分: 3

以下是翻译好的部分：

你可以增加 app.yaml 文件中 resources 部分中的值。你可以在这里了解更多信息。
你可以增加 app_start_timeout_sec 的值到最大值，即 1800 秒。这样你可以给你的应用更多时间来变得健康。
即使运行 gcloud app describe 确认你已启用了 splitHealthChecks 功能，你是否按照迁移健康检查的所有正确步骤来进行迁移？它是否应用于你应用的所有版本，包括旧版本？你可以仔细检查需要执行的所有步骤，以将健康检查转换为这里。执行命令 gcloud app update --split-health-checks --project [YOUR_PROJECT_ID] 可能不足够。

编辑：
尽管从理论上讲，如果你没有将流量分割到不同的版本，这不应该是一个问题（我无法想出为什么会有问题的原因）。在迁移文档的第二步中，它说：

> 为应用程序的每个版本转换传统健康检查选项。

为了做到这一点，你应该为每个版本编写并更新相应的 app.yaml，然后部署服务，如分配给特定版本 ID。例如：gcloud app deploy --project PROJECT_ID --version VERSION_ID --no-promote。

作为一种解决方法，你可以“伪造” readiness_check 响应，在一段时间后返回 200 状态响应。你需要在这个部分中添加一个自定义处理程序。这样部署就不会超时，并且会在后台继续工作。然而，这会忽略了就绪检查的目的，因为你的实例可能在尚未准备好的情况下接收流量。如果你考虑到这一点，并且可以在你的应用程序中处理这个问题，这可能是一个可以考虑的选项。

最后，我假设你正在使用 App Engine Flex，因为标准版本不提供健康检查并且会出现错误。你可以在这里的讨论中查看更多信息。

英文:

There could be more possible ways to fix this:

You can increase the values in the resources rubric within your app.yaml file. You can check more about this here.
You can increase the value of app_start_timeout_sec to the maximum value, which is 1800. That way you can give a bit more time to your app to become healthy.
Even though, running gcloud app describe confirms that you have splitHealthChecks feature enabled, did you do the all the proper steps in migrating the health check from the legacy version? Is it applied to all the versions of your app, even the old ones?
You can check carefully all the steps that needed to be taken to convert the health checks here. Applying the command gcloud app update --split-health-checks --project [YOUR_PROJECT_ID] may not be enough.

EDIT:
Even though, theoretically, if you did not split your traffic across different versions, this should not be a problem ( I cannot think for a reason why it would be ). In the documentation about migration, in the step 2, it says that :

> Convert legacy health check options for each version in your application.

In order to do that, you should write and update the app.yaml accordingly for each version and, then deploy the service, as assigned to a certain version ID. eg : gcloud app deploy --project PROJECT_ID --version VERSION_ID --no-promote

As a workaround, you could "fake" the readiness_check response, to give a 200 status response after a certain time. You would have to add a custom handler in this section. This way the deployment won't time out, and will keep working in the background. However, this misses the purpose of the readiness checks, as your instance might receive traffic when it's not ready to do so. If you have this in mind, and can handle this in your application, it would be an option to consider.

Finally, I suppose you are using App Engine Flex, as for the Standard version the healthchecks are not available and there will be errors. You can check this discussion here.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Google App Engine应用部署失败，尽管readiness_check返回200状态响应。

问题

答案1

在init()函数中，如何在Go App Engine中获取版本而不使用Context？

获取资源 SKU 以在 GCP 中查找资源价格（使用 Go 语言）

使用Google Cloud CDN/负载均衡器记录访问日志中的Cookie信息

Firebase Cloud Functions V2 – 部署时出错

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论