Docker Compose健康检查:服务永远不会变得不健康

huangapple go评论90阅读模式
英文:

Docker Compose healthcheck: service never becomes unhealthy

问题

以下是您要翻译的内容:

到目前为止,一切都按预期进行。例如,如果我在我的应用程序中有一个拼写错误(在我的应用程序中的healthcheck端点路由中),启动将失败,如下所示:

  1. database | 2023-06-01 23:01:44.410 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
  2. database | 2023-06-01 23:01:44.410 UTC [1] LOG: listening on IPv6 address "::", port 5432
  3. database | 2023-06-01 23:01:44.411 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
  4. database | 2023-06-01 23:01:44.414 UTC [22] LOG: database system was shut down at 2023-06-01 22:51:10 UTC
  5. database | 2023-06-01 23:01:44.417 UTC [1] LOG: database system is ready to accept connections
  6. backend | INFO: Will watch for changes in these directories: ['/backend']
  7. backend | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
  8. backend | INFO: Started reloader process [1] using StatReload
  9. backend | INFO: Started server process [8]
  10. backend | INFO: Waiting for application startup.
  11. backend | INFO: Application startup complete.
  12. backend | INFO: 127.0.0.1:41294 - "GET /healthcheck HTTP/1.1" 404 Not Found
  13. backend | INFO: 127.0.0.1:41296 - "GET /healthcheck HTTP/1.1" 404 Not Found
  14. backend | INFO: 127.0.0.1:41298 - "GET /healthcheck HTTP/1.1" 404 Not Found
  15. dependency failed to start: container backend is unhealthy

我感到困惑的是,如果在成功启动后,我以某种方式更改了应用程序,使得backend不健康,容器将检测到更改,检查将返回404(如预期的那样),但它永远不会变得不健康。

  1. database | 2023-06-01 23:06:37.396 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
  2. database | 2023-06-01 23:06:37.396 UTC [1] LOG: listening on IPv6 address "::", port 5432
  3. database | 2023-06-01 23:06:37.397 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
  4. database | 2023-06-01 23:06:37.400 UTC [22] LOG: database system was shut down at 2023-06-01 23:06:34 UTC
  5. database | 2023-06-01 23:06:37.403 UTC [1] LOG: database system is ready to accept connections
  6. backend | INFO: Will watch for changes in these directories: ['/backend']
  7. backend | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
  8. backend | INFO: Started reloader process [1] using StatReload
  9. backend | INFO: Started server process [9]
  10. backend | INFO: Waiting for application startup.
  11. backend | INFO: Application startup complete.
  12. backend | INFO: 127.0.0.1:57966 - "GET /healthcheck HTTP/1.1" 200 OK
  13. frontend |
  14. frontend | > frontend@0.0.0 dev
  15. frontend | > vite --host
  16. frontend |
  17. frontend | Forced re-optimization of dependencies
  18. frontend |
  19. frontend | VITE v4.3.1 ready in 285 ms
  20. frontend |
  21. frontend | Local: http://localhost:5173/
  22. frontend | Network: http://172.26.0.4:5173/
  23. backend | INFO: 127.0.0.1:57968 - "GET /healthcheck HTTP/1.1" 200 OK
  24. backend | INFO: 127.0.0.1:57982 - "GET /healthcheck HTTP/1.1" 200 OK
  25. backend | INFO: 127.0.0.1:57992 - "GET /healthcheck HTTP/1.1" 200 OK
  26. backend | INFO: 127.0.0.1:58002 - "GET /healthcheck HTTP/1.1" 200 OK
  27. backend | INFO: 127.0.0.1:58012 - "GET /healthcheck HTTP/1.1" 200 OK
  28. backend | INFO: 127.0.0.1:58018 - "GET /healthcheck HTTP/1.1" 200 OK
  29. backend | WARNING: StatReload detected changes in 'src/main.py'. Reloading...
  30. backend | INFO: Shutting down
  31. backend | INFO: Waiting for application shutdown.
  32. backend | INFO: Application shutdown complete.
  33. backend | INFO: Finished server process [9]
  34. backend | INFO: Started server process [76]
  35. backend | INFO: Waiting for application startup.
  36. backend | INFO: Application startup complete.
  37. backend | INFO: 127.0.0.1:58028 - "GET /healthcheck HTTP/1.1" 404 Not Found
  38. backend | INFO: 127.0.0.1:58040 - "GET /healthcheck HTTP/1.1" 404 Not Found
  39. backend | INFO: 127.0.0.1:35092 - "GET /healthcheck HTTP/1.1" 404 Not Found
  40. backend | INFO: 127.0.0.1:35098 - "GET /healthcheck HTTP/1.1" 404 Not Found
  41. backend | INFO: 127.0.0.1:35102 - "GET /healthcheck HTTP/1.1" 404 Not Found
  42. backend | INFO: 127.0.0.1:35116 - "GET /healthcheck HTTP/1.1" 404 Not Found
  43. backend | INFO: 127.0.0.1:35126 - "GET /health
  44. <details>
  45. <summary>英文:</summary>
  46. I have a compose file with three services (database, backend and frontend). Backend depends on database being healthy, and frontend depends on backend being healthy.
  47. Database (postgres) checks for its own health using `pg_isready` and backend (FastAPI) checks for its health via an endpoint `http://localhost:8080/healthcheck`
  48. Compose file:

version: '3'
services:

database:
image: postgres:14-alpine
healthcheck:
test: pg_isready -U postgres
interval: 1s
timeout: 5s
retries: 5
start_period: 10s

backend:
depends_on:
database:
condition: service_healthy

  1. image: backend-api-image
  2. build:
  3. context: backend
  4. dockerfile: Dockerfile
  5. ports:
  6. - &quot;8080:8080&quot;
  7. volumes:
  8. - &#39;./backend:/backend&#39;
  9. healthcheck:
  10. test: wget --no-verbose --tries=1 --spider http://localhost:8080/healthcheck || exit 1
  11. interval: 1s
  12. timeout: 5s

frontend:
image: my-frontend
depends_on:
backend:
condition: service_healthy
build:
context: ./frontend
dockerfile: Dockerfile

  1. FastAPI app
  2. ```python
  3. from fastapi import FastAPI
  4. from fastapi.middleware.cors import CORSMiddleware
  5. app = FastAPI()
  6. app.add_middleware(
  7. CORSMiddleware,
  8. allow_origins=[&quot;*&quot;],
  9. allow_credentials=True,
  10. allow_methods=[&quot;*&quot;],
  11. allow_headers=[&quot;*&quot;],
  12. )
  13. @app.get(&#39;/healthcheck&#39;)
  14. def get_healthcheck():
  15. return &#39;OK&#39;

So far this all works as expected. If, for example I were to have a typo in my healthcheck endpoint route (in my app), startup would fail, like so:

  1. database | 2023-06-01 23:01:44.410 UTC [1] LOG: listening on IPv4 address &quot;0.0.0.0&quot;, port 5432
  2. database | 2023-06-01 23:01:44.410 UTC [1] LOG: listening on IPv6 address &quot;::&quot;, port 5432
  3. database | 2023-06-01 23:01:44.411 UTC [1] LOG: listening on Unix socket &quot;/var/run/postgresql/.s.PGSQL.5432&quot;
  4. database | 2023-06-01 23:01:44.414 UTC [22] LOG: database system was shut down at 2023-06-01 22:51:10 UTC
  5. database | 2023-06-01 23:01:44.417 UTC [1] LOG: database system is ready to accept connections
  6. backend | INFO: Will watch for changes in these directories: [&#39;/backend&#39;]
  7. backend | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
  8. backend | INFO: Started reloader process [1] using StatReload
  9. backend | INFO: Started server process [8]
  10. backend | INFO: Waiting for application startup.
  11. backend | INFO: Application startup complete.
  12. backend | INFO: 127.0.0.1:41294 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
  13. backend | INFO: 127.0.0.1:41296 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
  14. backend | INFO: 127.0.0.1:41298 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
  15. dependency failed to start: container backend is unhealthy

Where I'm getting confused is, that after a successful startup, if I change the app in such a way to make backend become unhealthy, the container would detect the change and the check would return a 404 (as expected) but it would never become unhealthy.

  1. database | 2023-06-01 23:06:37.396 UTC [1] LOG: listening on IPv4 address &quot;0.0.0.0&quot;, port 5432
  2. database | 2023-06-01 23:06:37.396 UTC [1] LOG: listening on IPv6 address &quot;::&quot;, port 5432
  3. database | 2023-06-01 23:06:37.397 UTC [1] LOG: listening on Unix socket &quot;/var/run/postgresql/.s.PGSQL.5432&quot;
  4. database | 2023-06-01 23:06:37.400 UTC [22] LOG: database system was shut down at 2023-06-01 23:06:34 UTC
  5. database | 2023-06-01 23:06:37.403 UTC [1] LOG: database system is ready to accept connections
  6. backend | INFO: Will watch for changes in these directories: [&#39;/backend&#39;]
  7. backend | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
  8. backend | INFO: Started reloader process [1] using StatReload
  9. backend | INFO: Started server process [9]
  10. backend | INFO: Waiting for application startup.
  11. backend | INFO: Application startup complete.
  12. backend | INFO: 127.0.0.1:49450 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
  13. frontend |
  14. frontend | &gt; frontend@0.0.0 dev
  15. frontend | &gt; vite --host
  16. frontend |
  17. frontend | Forced re-optimization of dependencies
  18. frontend |
  19. frontend | VITE v4.3.1 ready in 285 ms
  20. frontend |
  21. frontend | Local: http://localhost:5173/
  22. frontend | Network: http://172.26.0.4:5173/
  23. backend | INFO: 127.0.0.1:57966 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
  24. backend | INFO: 127.0.0.1:57968 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
  25. backend | INFO: 127.0.0.1:57982 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
  26. backend | INFO: 127.0.0.1:57992 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
  27. backend | INFO: 127.0.0.1:58002 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
  28. backend | INFO: 127.0.0.1:58012 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
  29. backend | INFO: 127.0.0.1:58018 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
  30. backend | WARNING: StatReload detected changes in &#39;src/main.py&#39;. Reloading...
  31. backend | INFO: Shutting down
  32. backend | INFO: Waiting for application shutdown.
  33. backend | INFO: Application shutdown complete.
  34. backend | INFO: Finished server process [9]
  35. backend | INFO: Started server process [76]
  36. backend | INFO: Waiting for application startup.
  37. backend | INFO: Application startup complete.
  38. backend | INFO: 127.0.0.1:58028 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
  39. backend | INFO: 127.0.0.1:58040 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
  40. backend | INFO: 127.0.0.1:35092 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
  41. backend | INFO: 127.0.0.1:35098 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
  42. backend | INFO: 127.0.0.1:35102 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
  43. backend | INFO: 127.0.0.1:35116 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
  44. backend | INFO: 127.0.0.1:35126 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
  45. backend | INFO: 127.0.0.1:35134 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found

What I expected:

While running after a successful startup, upon changing the backend code in such a way that its healthcheck would fail, I expected frontend to exit or become degraded somehow, as its health dependency has failed.

What happened:

Everything kept running as if nothing happened, even though the backend healthcheck returned a failing value.

My questions:

  • Is the healthcheck only valid during startup to wait for a container to be "ready"? Documentation seems to suggest so.

  • If so, then why keep checking for health after successful startup?

  • If not, why is the backend container not being marked as unhealthy when changes cause its healthcheck to fail while running?

  • Is there a way to degrade a container to unhealthy while running after a successful startup?

  • I'm aware that I can use kill 1 instead of exit 1 and that would cause backend container to stop, but doesn't seem very clean.

答案1

得分: 0

在尝试复现您描述的行为时,我遇到的第一个问题是,标准版本的 wget 在使用 --spider 选项时会发起 HEAD 请求,因此您的健康检查结果如下:

  1. HEAD /healthcheck HTTP/1.1" 405 Method Not Allowed

这是使用 python:3.11 镜像中安装的 wget 版本 1.21。我将健康检查修改为如下(并省略了您的 docker-compose.yaml 中无关的部分):

  1. version: '3'
  2. services:
  3. backend:
  4. image: backend-api-image
  5. build:
  6. context: backend
  7. dockerfile: Dockerfile
  8. ports:
  9. - "8080:8080"
  10. volumes:
  11. - './backend:/backend'
  12. healthcheck:
  13. test: wget --no-verbose -O /dev/null --tries=1 http://localhost:8080/healthcheck || exit 1
  14. interval: 1s
  15. timeout: 5s

我在 backend/backend.py 中有您的示例 FastAPI 代码,我的 backend/Dockerfile 如下:

  1. FROM python:3.11
  2. WORKDIR /app
  3. RUN python3 -m venv .venv
  4. ENV PATH=/app/.venv/bin:/usr/local/bin:/usr/bin:/bin
  5. COPY requirements.txt ./
  6. RUN . .venv/bin/activate && pip install -r requirements.txt
  7. COPY . ./
  8. CMD ["uvicorn", "--reload", "--host", "0.0.0.0", "--port", "8080", "backend:app"]

当我运行 docker-compose up 时,我看到:

  1. backend_1 | INFO: 127.0.0.1:44856 - "GET /healthcheck HTTP/1.1" 200 OK
  2. backend_1 | INFO: 127.0.0.1:44884 - "GET /healthcheck HTTP/1.1" 200 OK

...并且容器进入“健康”状态:

  1. NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
  2. webserver_backend_1 backend-api-image "uvicorn --reload --…" backend 24 seconds ago Up 23 seconds (healthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp

如果我使用 docker exec 进入容器并修改 FastAPI 应用程序以返回一个错误,使得代码如下:

  1. backend_1 | WARNING: StatReload detected changes in 'backend.py'. Reloading...
  2. backend_1 | INFO: Shutting down
  3. backend_1 | INFO: Waiting for application shutdown.
  4. backend_1 | INFO: Application shutdown complete.
  5. backend_1 | INFO: Finished server process [8]
  6. backend_1 | INFO: Started server process [1050]
  7. backend_1 | INFO: Waiting for application startup.
  8. backend_1 | INFO: Application startup complete.
  9. backend_1 | INFO: 127.0.0.1:44618 - "GET /healthcheck HTTP/1.1" 400 Bad Request
  10. backend_1 | INFO: 127.0.0.1:48912 - "GET /healthcheck HTTP/1.1" 400 Bad Request

然后容器会进入“不健康”状态:

  1. NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
  2. webserver_backend_1 backend-api-image "uvicorn --reload --…" backend 2 minutes ago Up 2 minutes (unhealthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp

所有这些似乎按预期工作:随着来自 FastAPI 服务的响应变化,容器的健康状态也会发生变化。

以下是一些问题,以帮助进一步诊断您这边的问题:

  • 您的 FastAPI 服务的 Dockerfile 是什么样的?特别是基础镜像是什么?

  • 您是否验证了该镜像中的 wget 命令是否按预期对来自服务器的非 200 响应返回错误代码?

英文:

In trying to reproduce the behavior you've described, the first problem I ran into is that the standard version of wget will make HEAD requests when using the --spider option, so that your healthcheck results in:

  1. HEAD /healthcheck HTTP/1.1&quot; 405 Method Not Allowed

This is using wget version 1.21 as installed in the python:3.11 image. I modified the healthcheck to look like this (and dropped the irrelevant parts of your docker-compose.yaml):

  1. version: &#39;3&#39;
  2. services:
  3. backend:
  4. image: backend-api-image
  5. build:
  6. context: backend
  7. dockerfile: Dockerfile
  8. ports:
  9. - &quot;8080:8080&quot;
  10. volumes:
  11. - &#39;./backend:/backend&#39;
  12. healthcheck:
  13. test: wget --no-verbose -O /dev/null --tries=1 http://localhost:8080/healthcheck || exit 1
  14. interval: 1s
  15. timeout: 5s

I have your example FastAPI code in backend/backend.py, and my backend/Dockerfile looks like:

  1. FROM python:3.11
  2. WORKDIR /app
  3. RUN python3 -m venv .venv
  4. ENV PATH=/app/.venv/bin:/usr/local/bin:/usr/bin:/bin
  5. COPY requirements.txt ./
  6. RUN . .venv/bin/activate &amp;&amp; pip install -r requirements.txt
  7. COPY . ./
  8. CMD [&quot;uvicorn&quot;, &quot;--reload&quot;, &quot;--host&quot;, &quot;0.0.0.0&quot;, &quot;--port&quot;, &quot;8080&quot;, &quot;backend:app&quot;]

When I run docker-compose up, I see:

  1. backend_1 | INFO: 127.0.0.1:44856 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
  2. backend_1 | INFO: 127.0.0.1:44884 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK

...and the container enters the "healthy" state:

  1. NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
  2. webserver_backend_1 backend-api-image &quot;uvicorn --reload --…&quot; backend 24 seconds ago Up 23 seconds (healthy) 0.0.0.0:8080-&gt;8080/tcp, :::8080-&gt;8080/tcp

If I docker exec into the container and modify the FastAPI application to return an error, so that the code looks like this:

  1. backend_1 | WARNING: StatReload detected changes in &#39;backend.py&#39;. Reloading...
  2. backend_1 | INFO: Shutting down
  3. backend_1 | INFO: Waiting for application shutdown.
  4. backend_1 | INFO: Application shutdown complete.
  5. backend_1 | INFO: Finished server process [8]
  6. backend_1 | INFO: Started server process [1050]
  7. backend_1 | INFO: Waiting for application startup.
  8. backend_1 | INFO: Application startup complete.
  9. backend_1 | INFO: 127.0.0.1:44618 - &quot;GET /healthcheck HTTP/1.1&quot; 400 Bad Request
  10. backend_1 | INFO: 127.0.0.1:48912 - &quot;GET /healthcheck HTTP/1.1&quot; 400 Bad Request

And the container enters the "unhealthy" state:

  1. NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
  2. webserver_backend_1 backend-api-image &quot;uvicorn --reload --…&quot; backend 2 minutes ago Up 2 minutes (unhealthy) 0.0.0.0:8080-&gt;8080/tcp, :::8080-&gt;8080/tcp

That all seems to work as expected: the container health status changes as the response from the FastAPI service changes.

Here are some questions to help further diagnose things on your end:

  • What does the Dockerfile for your FastAPI service look like? In particular, what's the base image?

  • Have you verified that the wget command in that image returns an error code as expected for a non-200 response from the server?

huangapple
  • 本文由 发表于 2023年6月2日 07:23:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76386290.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定