2023年6月2日 07:23:11go评论90阅读模式

英文:

Docker Compose healthcheck: service never becomes unhealthy

问题

以下是您要翻译的内容：

到目前为止，一切都按预期进行。例如，如果我在我的应用程序中有一个拼写错误（在我的应用程序中的healthcheck端点路由中），启动将失败，如下所示：

database  | 2023-06-01 23:01:44.410 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
database  | 2023-06-01 23:01:44.410 UTC [1] LOG:  listening on IPv6 address "::", port 5432
database  | 2023-06-01 23:01:44.411 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database  | 2023-06-01 23:01:44.414 UTC [22] LOG:  database system was shut down at 2023-06-01 22:51:10 UTC
database  | 2023-06-01 23:01:44.417 UTC [1] LOG:  database system is ready to accept connections
backend   | INFO:     Will watch for changes in these directories: ['/backend']
backend   | INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend   | INFO:     Started reloader process [1] using StatReload
backend   | INFO:     Started server process [8]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:41294 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:41296 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:41298 - "GET /healthcheck HTTP/1.1" 404 Not Found
dependency failed to start: container backend is unhealthy

我感到困惑的是，如果在成功启动后，我以某种方式更改了应用程序，使得backend不健康，容器将检测到更改，检查将返回404（如预期的那样），但它永远不会变得不健康。

database  | 2023-06-01 23:06:37.396 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
database  | 2023-06-01 23:06:37.396 UTC [1] LOG:  listening on IPv6 address "::", port 5432
database  | 2023-06-01 23:06:37.397 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database  | 2023-06-01 23:06:37.400 UTC [22] LOG:  database system was shut down at 2023-06-01 23:06:34 UTC
database  | 2023-06-01 23:06:37.403 UTC [1] LOG:  database system is ready to accept connections
backend   | INFO:     Will watch for changes in these directories: ['/backend']
backend   | INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend   | INFO:     Started reloader process [1] using StatReload
backend   | INFO:     Started server process [9]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:57966 - "GET /healthcheck HTTP/1.1" 200 OK
frontend  | 
frontend  | > frontend@0.0.0 dev
frontend  | > vite --host
frontend  | 
frontend  | Forced re-optimization of dependencies
frontend  | 
frontend  |   VITE v4.3.1  ready in 285 ms
frontend  | 
frontend  |   ➜  Local:   http://localhost:5173/
frontend  |   ➜  Network: http://172.26.0.4:5173/
backend   | INFO:     127.0.0.1:57968 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:57982 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:57992 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:58002 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:58012 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:58018 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | WARNING:  StatReload detected changes in 'src/main.py'. Reloading...
backend   | INFO:     Shutting down
backend   | INFO:     Waiting for application shutdown.
backend   | INFO:     Application shutdown complete.
backend   | INFO:     Finished server process [9]
backend   | INFO:     Started server process [76]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:58028 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:58040 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35092 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35098 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35102 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35116 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35126 - "GET /health
<details>
<summary>英文:</summary>
I have a compose file with three services (database, backend and frontend). Backend depends on database being healthy, and frontend depends on backend being healthy.
Database (postgres) checks for its own health using `pg_isready` and backend (FastAPI) checks for its health via an endpoint `http://localhost:8080/healthcheck`
Compose file:

version: '3'
services:

database:
image: postgres:14-alpine
healthcheck:
test: pg_isready -U postgres
interval: 1s
timeout: 5s
retries: 5
start_period: 10s

backend:
depends_on:
database:
condition: service_healthy

image: backend-api-image
build: 
context: backend
dockerfile: Dockerfile
ports:
- &quot;8080:8080&quot;
volumes:
- &#39;./backend:/backend&#39;
healthcheck:
test: wget --no-verbose --tries=1 --spider http://localhost:8080/healthcheck || exit 1
interval: 1s
timeout: 5s

frontend:
image: my-frontend
depends_on:
backend:
condition: service_healthy
build:
context: ./frontend
dockerfile: Dockerfile


FastAPI app
```python
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=[&quot;*&quot;],
allow_credentials=True,
allow_methods=[&quot;*&quot;],
allow_headers=[&quot;*&quot;],
)
@app.get(&#39;/healthcheck&#39;)
def get_healthcheck():
return &#39;OK&#39;

So far this all works as expected. If, for example I were to have a typo in my healthcheck endpoint route (in my app), startup would fail, like so:

database  | 2023-06-01 23:01:44.410 UTC [1] LOG:  listening on IPv4 address &quot;0.0.0.0&quot;, port 5432
database  | 2023-06-01 23:01:44.410 UTC [1] LOG:  listening on IPv6 address &quot;::&quot;, port 5432
database  | 2023-06-01 23:01:44.411 UTC [1] LOG:  listening on Unix socket &quot;/var/run/postgresql/.s.PGSQL.5432&quot;
database  | 2023-06-01 23:01:44.414 UTC [22] LOG:  database system was shut down at 2023-06-01 22:51:10 UTC
database  | 2023-06-01 23:01:44.417 UTC [1] LOG:  database system is ready to accept connections
backend   | INFO:     Will watch for changes in these directories: [&#39;/backend&#39;]
backend   | INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend   | INFO:     Started reloader process [1] using StatReload
backend   | INFO:     Started server process [8]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:41294 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:41296 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:41298 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
dependency failed to start: container backend is unhealthy

Where I'm getting confused is, that after a successful startup, if I change the app in such a way to make backend become unhealthy, the container would detect the change and the check would return a 404 (as expected) but it would never become unhealthy.

database  | 2023-06-01 23:06:37.396 UTC [1] LOG:  listening on IPv4 address &quot;0.0.0.0&quot;, port 5432
database  | 2023-06-01 23:06:37.396 UTC [1] LOG:  listening on IPv6 address &quot;::&quot;, port 5432
database  | 2023-06-01 23:06:37.397 UTC [1] LOG:  listening on Unix socket &quot;/var/run/postgresql/.s.PGSQL.5432&quot;
database  | 2023-06-01 23:06:37.400 UTC [22] LOG:  database system was shut down at 2023-06-01 23:06:34 UTC
database  | 2023-06-01 23:06:37.403 UTC [1] LOG:  database system is ready to accept connections
backend   | INFO:     Will watch for changes in these directories: [&#39;/backend&#39;]
backend   | INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend   | INFO:     Started reloader process [1] using StatReload
backend   | INFO:     Started server process [9]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:49450 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
frontend  | 
frontend  | &gt; frontend@0.0.0 dev
frontend  | &gt; vite --host
frontend  | 
frontend  | Forced re-optimization of dependencies
frontend  | 
frontend  |   VITE v4.3.1  ready in 285 ms
frontend  | 
frontend  |   ➜  Local:   http://localhost:5173/
frontend  |   ➜  Network: http://172.26.0.4:5173/
backend   | INFO:     127.0.0.1:57966 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend   | INFO:     127.0.0.1:57968 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend   | INFO:     127.0.0.1:57982 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend   | INFO:     127.0.0.1:57992 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend   | INFO:     127.0.0.1:58002 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend   | INFO:     127.0.0.1:58012 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend   | INFO:     127.0.0.1:58018 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend   | WARNING:  StatReload detected changes in &#39;src/main.py&#39;. Reloading...
backend   | INFO:     Shutting down
backend   | INFO:     Waiting for application shutdown.
backend   | INFO:     Application shutdown complete.
backend   | INFO:     Finished server process [9]
backend   | INFO:     Started server process [76]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:58028 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:58040 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:35092 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:35098 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:35102 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:35116 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:35126 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:35134 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found

What I expected:

While running after a successful startup, upon changing the backend code in such a way that its healthcheck would fail, I expected frontend to exit or become degraded somehow, as its health dependency has failed.

What happened:

Everything kept running as if nothing happened, even though the backend healthcheck returned a failing value.

My questions:

Is the healthcheck only valid during startup to wait for a container to be "ready"? Documentation seems to suggest so.
If so, then why keep checking for health after successful startup?
If not, why is the backend container not being marked as unhealthy when changes cause its healthcheck to fail while running?
Is there a way to degrade a container to unhealthy while running after a successful startup?
I'm aware that I can use kill 1 instead of exit 1 and that would cause backend container to stop, but doesn't seem very clean.

答案1

得分: 0

在尝试复现您描述的行为时，我遇到的第一个问题是，标准版本的 wget 在使用 --spider 选项时会发起 HEAD 请求，因此您的健康检查结果如下：

HEAD /healthcheck HTTP/1.1" 405 Method Not Allowed

这是使用 python:3.11 镜像中安装的 wget 版本 1.21。我将健康检查修改为如下（并省略了您的 docker-compose.yaml 中无关的部分）：

version: '3'
services:
  backend:
    image: backend-api-image
    build:
      context: backend
      dockerfile: Dockerfile
    ports:
      - "8080:8080"
    volumes:
      - './backend:/backend'
    healthcheck:
      test: wget --no-verbose -O /dev/null --tries=1 http://localhost:8080/healthcheck || exit 1
      interval: 1s
      timeout: 5s

我在 backend/backend.py 中有您的示例 FastAPI 代码，我的 backend/Dockerfile 如下：

FROM python:3.11
WORKDIR /app
RUN python3 -m venv .venv
ENV PATH=/app/.venv/bin:/usr/local/bin:/usr/bin:/bin
COPY requirements.txt ./
RUN . .venv/bin/activate && pip install -r requirements.txt
COPY . ./
CMD ["uvicorn", "--reload", "--host", "0.0.0.0", "--port", "8080", "backend:app"]

当我运行 docker-compose up 时，我看到：

backend_1  | INFO:     127.0.0.1:44856 - "GET /healthcheck HTTP/1.1" 200 OK
backend_1  | INFO:     127.0.0.1:44884 - "GET /healthcheck HTTP/1.1" 200 OK

...并且容器进入“健康”状态：

NAME                  IMAGE               COMMAND                  SERVICE             CREATED             STATUS                    PORTS
webserver_backend_1   backend-api-image   "uvicorn --reload --…"   backend             24 seconds ago      Up 23 seconds (healthy)   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp

如果我使用 docker exec 进入容器并修改 FastAPI 应用程序以返回一个错误，使得代码如下：

backend_1  | WARNING:  StatReload detected changes in 'backend.py'. Reloading...
backend_1  | INFO:     Shutting down
backend_1  | INFO:     Waiting for application shutdown.
backend_1  | INFO:     Application shutdown complete.
backend_1  | INFO:     Finished server process [8]
backend_1  | INFO:     Started server process [1050]
backend_1  | INFO:     Waiting for application startup.
backend_1  | INFO:     Application startup complete.
backend_1  | INFO:     127.0.0.1:44618 - "GET /healthcheck HTTP/1.1" 400 Bad Request
backend_1  | INFO:     127.0.0.1:48912 - "GET /healthcheck HTTP/1.1" 400 Bad Request

然后容器会进入“不健康”状态：

NAME                  IMAGE               COMMAND                  SERVICE             CREATED             STATUS                     PORTS
webserver_backend_1   backend-api-image   "uvicorn --reload --…"   backend             2 minutes ago       Up 2 minutes (unhealthy)   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp

所有这些似乎按预期工作：随着来自 FastAPI 服务的响应变化，容器的健康状态也会发生变化。

以下是一些问题，以帮助进一步诊断您这边的问题：

您的 FastAPI 服务的 Dockerfile 是什么样的？特别是基础镜像是什么？
您是否验证了该镜像中的 wget 命令是否按预期对来自服务器的非 200 响应返回错误代码？

英文:

In trying to reproduce the behavior you've described, the first problem I ran into is that the standard version of wget will make HEAD requests when using the --spider option, so that your healthcheck results in:

HEAD /healthcheck HTTP/1.1&quot; 405 Method Not Allowed

This is using wget version 1.21 as installed in the python:3.11 image. I modified the healthcheck to look like this (and dropped the irrelevant parts of your docker-compose.yaml):

version: &#39;3&#39;
services:
backend:
image: backend-api-image
build:
context: backend
dockerfile: Dockerfile
ports:
- &quot;8080:8080&quot;
volumes:
- &#39;./backend:/backend&#39;
healthcheck:
test: wget --no-verbose -O /dev/null --tries=1 http://localhost:8080/healthcheck || exit 1
interval: 1s
timeout: 5s

I have your example FastAPI code in backend/backend.py, and my backend/Dockerfile looks like:

FROM python:3.11
WORKDIR /app
RUN python3 -m venv .venv
ENV PATH=/app/.venv/bin:/usr/local/bin:/usr/bin:/bin
COPY requirements.txt ./
RUN . .venv/bin/activate &amp;&amp; pip install -r requirements.txt
COPY . ./
CMD [&quot;uvicorn&quot;, &quot;--reload&quot;, &quot;--host&quot;, &quot;0.0.0.0&quot;, &quot;--port&quot;, &quot;8080&quot;, &quot;backend:app&quot;]

When I run docker-compose up, I see:

backend_1  | INFO:     127.0.0.1:44856 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend_1  | INFO:     127.0.0.1:44884 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK

...and the container enters the "healthy" state:

NAME                  IMAGE               COMMAND                  SERVICE             CREATED             STATUS                    PORTS
webserver_backend_1   backend-api-image   &quot;uvicorn --reload --…&quot;   backend             24 seconds ago      Up 23 seconds (healthy)   0.0.0.0:8080-&gt;8080/tcp, :::8080-&gt;8080/tcp

If I docker exec into the container and modify the FastAPI application to return an error, so that the code looks like this:

backend_1  | WARNING:  StatReload detected changes in &#39;backend.py&#39;. Reloading...
backend_1  | INFO:     Shutting down
backend_1  | INFO:     Waiting for application shutdown.
backend_1  | INFO:     Application shutdown complete.
backend_1  | INFO:     Finished server process [8]
backend_1  | INFO:     Started server process [1050]
backend_1  | INFO:     Waiting for application startup.
backend_1  | INFO:     Application startup complete.
backend_1  | INFO:     127.0.0.1:44618 - &quot;GET /healthcheck HTTP/1.1&quot; 400 Bad Request
backend_1  | INFO:     127.0.0.1:48912 - &quot;GET /healthcheck HTTP/1.1&quot; 400 Bad Request

And the container enters the "unhealthy" state:

NAME                  IMAGE               COMMAND                  SERVICE             CREATED             STATUS                     PORTS
webserver_backend_1   backend-api-image   &quot;uvicorn --reload --…&quot;   backend             2 minutes ago       Up 2 minutes (unhealthy)   0.0.0.0:8080-&gt;8080/tcp, :::8080-&gt;8080/tcp

That all seems to work as expected: the container health status changes as the response from the FastAPI service changes.

Here are some questions to help further diagnose things on your end:

What does the Dockerfile for your FastAPI service look like? In particular, what's the base image?
Have you verified that the wget command in that image returns an error code as expected for a non-200 response from the server?

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Docker Compose健康检查：服务永远不会变得不健康

问题

答案1

Docker在Mac上使用Docker Desktop登录本地注册表

如何解决从DNS解析到Docker容器名称的FastApi Keycloak令牌URL？

执行使用Golang exec的Docker命令失败。

这些变量在 Docker Compose 文件中是什么？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。