Docker Compose健康检查:服务永远不会变得不健康

huangapple go评论58阅读模式
英文:

Docker Compose healthcheck: service never becomes unhealthy

问题

以下是您要翻译的内容:

到目前为止,一切都按预期进行。例如,如果我在我的应用程序中有一个拼写错误(在我的应用程序中的healthcheck端点路由中),启动将失败,如下所示:

database  | 2023-06-01 23:01:44.410 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
database  | 2023-06-01 23:01:44.410 UTC [1] LOG:  listening on IPv6 address "::", port 5432
database  | 2023-06-01 23:01:44.411 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database  | 2023-06-01 23:01:44.414 UTC [22] LOG:  database system was shut down at 2023-06-01 22:51:10 UTC
database  | 2023-06-01 23:01:44.417 UTC [1] LOG:  database system is ready to accept connections
backend   | INFO:     Will watch for changes in these directories: ['/backend']
backend   | INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend   | INFO:     Started reloader process [1] using StatReload
backend   | INFO:     Started server process [8]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:41294 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:41296 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:41298 - "GET /healthcheck HTTP/1.1" 404 Not Found
dependency failed to start: container backend is unhealthy

我感到困惑的是,如果在成功启动后,我以某种方式更改了应用程序,使得backend不健康,容器将检测到更改,检查将返回404(如预期的那样),但它永远不会变得不健康。

database  | 2023-06-01 23:06:37.396 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
database  | 2023-06-01 23:06:37.396 UTC [1] LOG:  listening on IPv6 address "::", port 5432
database  | 2023-06-01 23:06:37.397 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database  | 2023-06-01 23:06:37.400 UTC [22] LOG:  database system was shut down at 2023-06-01 23:06:34 UTC
database  | 2023-06-01 23:06:37.403 UTC [1] LOG:  database system is ready to accept connections
backend   | INFO:     Will watch for changes in these directories: ['/backend']
backend   | INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend   | INFO:     Started reloader process [1] using StatReload
backend   | INFO:     Started server process [9]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:57966 - "GET /healthcheck HTTP/1.1" 200 OK
frontend  | 
frontend  | > frontend@0.0.0 dev
frontend  | > vite --host
frontend  | 
frontend  | Forced re-optimization of dependencies
frontend  | 
frontend  |   VITE v4.3.1  ready in 285 ms
frontend  | 
frontend  |   ➜  Local:   http://localhost:5173/
frontend  |   ➜  Network: http://172.26.0.4:5173/
backend   | INFO:     127.0.0.1:57968 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:57982 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:57992 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:58002 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:58012 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:58018 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | WARNING:  StatReload detected changes in 'src/main.py'. Reloading...
backend   | INFO:     Shutting down
backend   | INFO:     Waiting for application shutdown.
backend   | INFO:     Application shutdown complete.
backend   | INFO:     Finished server process [9]
backend   | INFO:     Started server process [76]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:58028 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:58040 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35092 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35098 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35102 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35116 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35126 - "GET /health

<details>
<summary>英文:</summary>

I have a compose file with three services (database, backend and frontend). Backend depends on database being healthy, and frontend depends on backend being healthy.

Database (postgres) checks for its own health using `pg_isready` and backend (FastAPI) checks for its health via an endpoint `http://localhost:8080/healthcheck`

Compose file:

version: '3'
services:

database:
image: postgres:14-alpine
healthcheck:
test: pg_isready -U postgres
interval: 1s
timeout: 5s
retries: 5
start_period: 10s

backend:
depends_on:
database:
condition: service_healthy

image: backend-api-image
build: 
context: backend
dockerfile: Dockerfile
ports:
- &quot;8080:8080&quot;
volumes:
- &#39;./backend:/backend&#39;
healthcheck:
test: wget --no-verbose --tries=1 --spider http://localhost:8080/healthcheck || exit 1
interval: 1s
timeout: 5s

frontend:
image: my-frontend
depends_on:
backend:
condition: service_healthy
build:
context: ./frontend
dockerfile: Dockerfile


FastAPI app
```python
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=[&quot;*&quot;],
allow_credentials=True,
allow_methods=[&quot;*&quot;],
allow_headers=[&quot;*&quot;],
)
@app.get(&#39;/healthcheck&#39;)
def get_healthcheck():
return &#39;OK&#39;

So far this all works as expected. If, for example I were to have a typo in my healthcheck endpoint route (in my app), startup would fail, like so:

database  | 2023-06-01 23:01:44.410 UTC [1] LOG:  listening on IPv4 address &quot;0.0.0.0&quot;, port 5432
database  | 2023-06-01 23:01:44.410 UTC [1] LOG:  listening on IPv6 address &quot;::&quot;, port 5432
database  | 2023-06-01 23:01:44.411 UTC [1] LOG:  listening on Unix socket &quot;/var/run/postgresql/.s.PGSQL.5432&quot;
database  | 2023-06-01 23:01:44.414 UTC [22] LOG:  database system was shut down at 2023-06-01 22:51:10 UTC
database  | 2023-06-01 23:01:44.417 UTC [1] LOG:  database system is ready to accept connections
backend   | INFO:     Will watch for changes in these directories: [&#39;/backend&#39;]
backend   | INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend   | INFO:     Started reloader process [1] using StatReload
backend   | INFO:     Started server process [8]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:41294 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:41296 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:41298 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
dependency failed to start: container backend is unhealthy

Where I'm getting confused is, that after a successful startup, if I change the app in such a way to make backend become unhealthy, the container would detect the change and the check would return a 404 (as expected) but it would never become unhealthy.

database  | 2023-06-01 23:06:37.396 UTC [1] LOG:  listening on IPv4 address &quot;0.0.0.0&quot;, port 5432
database  | 2023-06-01 23:06:37.396 UTC [1] LOG:  listening on IPv6 address &quot;::&quot;, port 5432
database  | 2023-06-01 23:06:37.397 UTC [1] LOG:  listening on Unix socket &quot;/var/run/postgresql/.s.PGSQL.5432&quot;
database  | 2023-06-01 23:06:37.400 UTC [22] LOG:  database system was shut down at 2023-06-01 23:06:34 UTC
database  | 2023-06-01 23:06:37.403 UTC [1] LOG:  database system is ready to accept connections
backend   | INFO:     Will watch for changes in these directories: [&#39;/backend&#39;]
backend   | INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend   | INFO:     Started reloader process [1] using StatReload
backend   | INFO:     Started server process [9]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:49450 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
frontend  | 
frontend  | &gt; frontend@0.0.0 dev
frontend  | &gt; vite --host
frontend  | 
frontend  | Forced re-optimization of dependencies
frontend  | 
frontend  |   VITE v4.3.1  ready in 285 ms
frontend  | 
frontend  |   ➜  Local:   http://localhost:5173/
frontend  |   ➜  Network: http://172.26.0.4:5173/
backend   | INFO:     127.0.0.1:57966 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend   | INFO:     127.0.0.1:57968 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend   | INFO:     127.0.0.1:57982 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend   | INFO:     127.0.0.1:57992 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend   | INFO:     127.0.0.1:58002 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend   | INFO:     127.0.0.1:58012 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend   | INFO:     127.0.0.1:58018 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend   | WARNING:  StatReload detected changes in &#39;src/main.py&#39;. Reloading...
backend   | INFO:     Shutting down
backend   | INFO:     Waiting for application shutdown.
backend   | INFO:     Application shutdown complete.
backend   | INFO:     Finished server process [9]
backend   | INFO:     Started server process [76]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:58028 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:58040 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:35092 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:35098 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:35102 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:35116 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:35126 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found
backend   | INFO:     127.0.0.1:35134 - &quot;GET /healthcheck HTTP/1.1&quot; 404 Not Found

What I expected:

While running after a successful startup, upon changing the backend code in such a way that its healthcheck would fail, I expected frontend to exit or become degraded somehow, as its health dependency has failed.

What happened:

Everything kept running as if nothing happened, even though the backend healthcheck returned a failing value.

My questions:

  • Is the healthcheck only valid during startup to wait for a container to be "ready"? Documentation seems to suggest so.

  • If so, then why keep checking for health after successful startup?

  • If not, why is the backend container not being marked as unhealthy when changes cause its healthcheck to fail while running?

  • Is there a way to degrade a container to unhealthy while running after a successful startup?

  • I'm aware that I can use kill 1 instead of exit 1 and that would cause backend container to stop, but doesn't seem very clean.

答案1

得分: 0

在尝试复现您描述的行为时,我遇到的第一个问题是,标准版本的 wget 在使用 --spider 选项时会发起 HEAD 请求,因此您的健康检查结果如下:

HEAD /healthcheck HTTP/1.1" 405 Method Not Allowed

这是使用 python:3.11 镜像中安装的 wget 版本 1.21。我将健康检查修改为如下(并省略了您的 docker-compose.yaml 中无关的部分):

version: '3'
services:

  backend:
    image: backend-api-image
    build:
      context: backend
      dockerfile: Dockerfile

    ports:
      - "8080:8080"
    volumes:
      - './backend:/backend'

    healthcheck:
      test: wget --no-verbose -O /dev/null --tries=1 http://localhost:8080/healthcheck || exit 1
      interval: 1s
      timeout: 5s

我在 backend/backend.py 中有您的示例 FastAPI 代码,我的 backend/Dockerfile 如下:

FROM python:3.11

WORKDIR /app
RUN python3 -m venv .venv
ENV PATH=/app/.venv/bin:/usr/local/bin:/usr/bin:/bin
COPY requirements.txt ./
RUN . .venv/bin/activate && pip install -r requirements.txt
COPY . ./

CMD ["uvicorn", "--reload", "--host", "0.0.0.0", "--port", "8080", "backend:app"]

当我运行 docker-compose up 时,我看到:

backend_1  | INFO:     127.0.0.1:44856 - "GET /healthcheck HTTP/1.1" 200 OK
backend_1  | INFO:     127.0.0.1:44884 - "GET /healthcheck HTTP/1.1" 200 OK

...并且容器进入“健康”状态:

NAME                  IMAGE               COMMAND                  SERVICE             CREATED             STATUS                    PORTS
webserver_backend_1   backend-api-image   "uvicorn --reload --…"   backend             24 seconds ago      Up 23 seconds (healthy)   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp

如果我使用 docker exec 进入容器并修改 FastAPI 应用程序以返回一个错误,使得代码如下:

backend_1  | WARNING:  StatReload detected changes in 'backend.py'. Reloading...
backend_1  | INFO:     Shutting down
backend_1  | INFO:     Waiting for application shutdown.
backend_1  | INFO:     Application shutdown complete.
backend_1  | INFO:     Finished server process [8]
backend_1  | INFO:     Started server process [1050]
backend_1  | INFO:     Waiting for application startup.
backend_1  | INFO:     Application startup complete.
backend_1  | INFO:     127.0.0.1:44618 - "GET /healthcheck HTTP/1.1" 400 Bad Request
backend_1  | INFO:     127.0.0.1:48912 - "GET /healthcheck HTTP/1.1" 400 Bad Request

然后容器会进入“不健康”状态:

NAME                  IMAGE               COMMAND                  SERVICE             CREATED             STATUS                     PORTS
webserver_backend_1   backend-api-image   "uvicorn --reload --…"   backend             2 minutes ago       Up 2 minutes (unhealthy)   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp

所有这些似乎按预期工作:随着来自 FastAPI 服务的响应变化,容器的健康状态也会发生变化。

以下是一些问题,以帮助进一步诊断您这边的问题:

  • 您的 FastAPI 服务的 Dockerfile 是什么样的?特别是基础镜像是什么?

  • 您是否验证了该镜像中的 wget 命令是否按预期对来自服务器的非 200 响应返回错误代码?

英文:

In trying to reproduce the behavior you've described, the first problem I ran into is that the standard version of wget will make HEAD requests when using the --spider option, so that your healthcheck results in:

HEAD /healthcheck HTTP/1.1&quot; 405 Method Not Allowed

This is using wget version 1.21 as installed in the python:3.11 image. I modified the healthcheck to look like this (and dropped the irrelevant parts of your docker-compose.yaml):

version: &#39;3&#39;
services:
backend:
image: backend-api-image
build:
context: backend
dockerfile: Dockerfile
ports:
- &quot;8080:8080&quot;
volumes:
- &#39;./backend:/backend&#39;
healthcheck:
test: wget --no-verbose -O /dev/null --tries=1 http://localhost:8080/healthcheck || exit 1
interval: 1s
timeout: 5s

I have your example FastAPI code in backend/backend.py, and my backend/Dockerfile looks like:

FROM python:3.11
WORKDIR /app
RUN python3 -m venv .venv
ENV PATH=/app/.venv/bin:/usr/local/bin:/usr/bin:/bin
COPY requirements.txt ./
RUN . .venv/bin/activate &amp;&amp; pip install -r requirements.txt
COPY . ./
CMD [&quot;uvicorn&quot;, &quot;--reload&quot;, &quot;--host&quot;, &quot;0.0.0.0&quot;, &quot;--port&quot;, &quot;8080&quot;, &quot;backend:app&quot;]

When I run docker-compose up, I see:

backend_1  | INFO:     127.0.0.1:44856 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK
backend_1  | INFO:     127.0.0.1:44884 - &quot;GET /healthcheck HTTP/1.1&quot; 200 OK

...and the container enters the "healthy" state:

NAME                  IMAGE               COMMAND                  SERVICE             CREATED             STATUS                    PORTS
webserver_backend_1   backend-api-image   &quot;uvicorn --reload --…&quot;   backend             24 seconds ago      Up 23 seconds (healthy)   0.0.0.0:8080-&gt;8080/tcp, :::8080-&gt;8080/tcp

If I docker exec into the container and modify the FastAPI application to return an error, so that the code looks like this:

backend_1  | WARNING:  StatReload detected changes in &#39;backend.py&#39;. Reloading...
backend_1  | INFO:     Shutting down
backend_1  | INFO:     Waiting for application shutdown.
backend_1  | INFO:     Application shutdown complete.
backend_1  | INFO:     Finished server process [8]
backend_1  | INFO:     Started server process [1050]
backend_1  | INFO:     Waiting for application startup.
backend_1  | INFO:     Application startup complete.
backend_1  | INFO:     127.0.0.1:44618 - &quot;GET /healthcheck HTTP/1.1&quot; 400 Bad Request
backend_1  | INFO:     127.0.0.1:48912 - &quot;GET /healthcheck HTTP/1.1&quot; 400 Bad Request

And the container enters the "unhealthy" state:

NAME                  IMAGE               COMMAND                  SERVICE             CREATED             STATUS                     PORTS
webserver_backend_1   backend-api-image   &quot;uvicorn --reload --…&quot;   backend             2 minutes ago       Up 2 minutes (unhealthy)   0.0.0.0:8080-&gt;8080/tcp, :::8080-&gt;8080/tcp

That all seems to work as expected: the container health status changes as the response from the FastAPI service changes.

Here are some questions to help further diagnose things on your end:

  • What does the Dockerfile for your FastAPI service look like? In particular, what's the base image?

  • Have you verified that the wget command in that image returns an error code as expected for a non-200 response from the server?

huangapple
  • 本文由 发表于 2023年6月2日 07:23:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76386290.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定