英文:
Docker Compose healthcheck: service never becomes unhealthy
问题
以下是您要翻译的内容:
到目前为止,一切都按预期进行。例如,如果我在我的应用程序中有一个拼写错误(在我的应用程序中的healthcheck
端点路由中),启动将失败,如下所示:
database | 2023-06-01 23:01:44.410 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
database | 2023-06-01 23:01:44.410 UTC [1] LOG: listening on IPv6 address "::", port 5432
database | 2023-06-01 23:01:44.411 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database | 2023-06-01 23:01:44.414 UTC [22] LOG: database system was shut down at 2023-06-01 22:51:10 UTC
database | 2023-06-01 23:01:44.417 UTC [1] LOG: database system is ready to accept connections
backend | INFO: Will watch for changes in these directories: ['/backend']
backend | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend | INFO: Started reloader process [1] using StatReload
backend | INFO: Started server process [8]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:41294 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:41296 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:41298 - "GET /healthcheck HTTP/1.1" 404 Not Found
dependency failed to start: container backend is unhealthy
我感到困惑的是,如果在成功启动后,我以某种方式更改了应用程序,使得backend
不健康,容器将检测到更改,检查将返回404
(如预期的那样),但它永远不会变得不健康。
database | 2023-06-01 23:06:37.396 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
database | 2023-06-01 23:06:37.396 UTC [1] LOG: listening on IPv6 address "::", port 5432
database | 2023-06-01 23:06:37.397 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database | 2023-06-01 23:06:37.400 UTC [22] LOG: database system was shut down at 2023-06-01 23:06:34 UTC
database | 2023-06-01 23:06:37.403 UTC [1] LOG: database system is ready to accept connections
backend | INFO: Will watch for changes in these directories: ['/backend']
backend | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend | INFO: Started reloader process [1] using StatReload
backend | INFO: Started server process [9]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:57966 - "GET /healthcheck HTTP/1.1" 200 OK
frontend |
frontend | > frontend@0.0.0 dev
frontend | > vite --host
frontend |
frontend | Forced re-optimization of dependencies
frontend |
frontend | VITE v4.3.1 ready in 285 ms
frontend |
frontend | ➜ Local: http://localhost:5173/
frontend | ➜ Network: http://172.26.0.4:5173/
backend | INFO: 127.0.0.1:57968 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:57982 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:57992 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58002 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58012 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58018 - "GET /healthcheck HTTP/1.1" 200 OK
backend | WARNING: StatReload detected changes in 'src/main.py'. Reloading...
backend | INFO: Shutting down
backend | INFO: Waiting for application shutdown.
backend | INFO: Application shutdown complete.
backend | INFO: Finished server process [9]
backend | INFO: Started server process [76]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:58028 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:58040 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35092 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35098 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35102 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35116 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35126 - "GET /health
<details>
<summary>英文:</summary>
I have a compose file with three services (database, backend and frontend). Backend depends on database being healthy, and frontend depends on backend being healthy.
Database (postgres) checks for its own health using `pg_isready` and backend (FastAPI) checks for its health via an endpoint `http://localhost:8080/healthcheck`
Compose file:
version: '3'
services:
database:
image: postgres:14-alpine
healthcheck:
test: pg_isready -U postgres
interval: 1s
timeout: 5s
retries: 5
start_period: 10s
backend:
depends_on:
database:
condition: service_healthy
image: backend-api-image
build:
context: backend
dockerfile: Dockerfile
ports:
- "8080:8080"
volumes:
- './backend:/backend'
healthcheck:
test: wget --no-verbose --tries=1 --spider http://localhost:8080/healthcheck || exit 1
interval: 1s
timeout: 5s
frontend:
image: my-frontend
depends_on:
backend:
condition: service_healthy
build:
context: ./frontend
dockerfile: Dockerfile
FastAPI app
```python
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.get('/healthcheck')
def get_healthcheck():
return 'OK'
So far this all works as expected. If, for example I were to have a typo in my healthcheck
endpoint route (in my app), startup would fail, like so:
database | 2023-06-01 23:01:44.410 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
database | 2023-06-01 23:01:44.410 UTC [1] LOG: listening on IPv6 address "::", port 5432
database | 2023-06-01 23:01:44.411 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database | 2023-06-01 23:01:44.414 UTC [22] LOG: database system was shut down at 2023-06-01 22:51:10 UTC
database | 2023-06-01 23:01:44.417 UTC [1] LOG: database system is ready to accept connections
backend | INFO: Will watch for changes in these directories: ['/backend']
backend | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend | INFO: Started reloader process [1] using StatReload
backend | INFO: Started server process [8]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:41294 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:41296 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:41298 - "GET /healthcheck HTTP/1.1" 404 Not Found
dependency failed to start: container backend is unhealthy
Where I'm getting confused is, that after a successful startup, if I change the app in such a way to make backend
become unhealthy, the container would detect the change and the check would return a 404
(as expected) but it would never become unhealthy.
database | 2023-06-01 23:06:37.396 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
database | 2023-06-01 23:06:37.396 UTC [1] LOG: listening on IPv6 address "::", port 5432
database | 2023-06-01 23:06:37.397 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database | 2023-06-01 23:06:37.400 UTC [22] LOG: database system was shut down at 2023-06-01 23:06:34 UTC
database | 2023-06-01 23:06:37.403 UTC [1] LOG: database system is ready to accept connections
backend | INFO: Will watch for changes in these directories: ['/backend']
backend | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend | INFO: Started reloader process [1] using StatReload
backend | INFO: Started server process [9]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:49450 - "GET /healthcheck HTTP/1.1" 200 OK
frontend |
frontend | > frontend@0.0.0 dev
frontend | > vite --host
frontend |
frontend | Forced re-optimization of dependencies
frontend |
frontend | VITE v4.3.1 ready in 285 ms
frontend |
frontend | ➜ Local: http://localhost:5173/
frontend | ➜ Network: http://172.26.0.4:5173/
backend | INFO: 127.0.0.1:57966 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:57968 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:57982 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:57992 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58002 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58012 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58018 - "GET /healthcheck HTTP/1.1" 200 OK
backend | WARNING: StatReload detected changes in 'src/main.py'. Reloading...
backend | INFO: Shutting down
backend | INFO: Waiting for application shutdown.
backend | INFO: Application shutdown complete.
backend | INFO: Finished server process [9]
backend | INFO: Started server process [76]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:58028 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:58040 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35092 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35098 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35102 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35116 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35126 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35134 - "GET /healthcheck HTTP/1.1" 404 Not Found
What I expected:
While running after a successful startup, upon changing the backend
code in such a way that its healthcheck would fail, I expected frontend
to exit or become degraded somehow, as its health dependency has failed.
What happened:
Everything kept running as if nothing happened, even though the backend
healthcheck returned a failing value.
My questions:
-
Is the healthcheck only valid during startup to wait for a container to be "ready"? Documentation seems to suggest so.
-
If so, then why keep checking for health after successful startup?
-
If not, why is the
backend
container not being marked as unhealthy when changes cause its healthcheck to fail while running? -
Is there a way to degrade a container to unhealthy while running after a successful startup?
-
I'm aware that I can use
kill 1
instead ofexit 1
and that would causebackend
container to stop, but doesn't seem very clean.
答案1
得分: 0
在尝试复现您描述的行为时,我遇到的第一个问题是,标准版本的 wget
在使用 --spider
选项时会发起 HEAD
请求,因此您的健康检查结果如下:
HEAD /healthcheck HTTP/1.1" 405 Method Not Allowed
这是使用 python:3.11
镜像中安装的 wget
版本 1.21
。我将健康检查修改为如下(并省略了您的 docker-compose.yaml
中无关的部分):
version: '3'
services:
backend:
image: backend-api-image
build:
context: backend
dockerfile: Dockerfile
ports:
- "8080:8080"
volumes:
- './backend:/backend'
healthcheck:
test: wget --no-verbose -O /dev/null --tries=1 http://localhost:8080/healthcheck || exit 1
interval: 1s
timeout: 5s
我在 backend/backend.py
中有您的示例 FastAPI 代码,我的 backend/Dockerfile
如下:
FROM python:3.11
WORKDIR /app
RUN python3 -m venv .venv
ENV PATH=/app/.venv/bin:/usr/local/bin:/usr/bin:/bin
COPY requirements.txt ./
RUN . .venv/bin/activate && pip install -r requirements.txt
COPY . ./
CMD ["uvicorn", "--reload", "--host", "0.0.0.0", "--port", "8080", "backend:app"]
当我运行 docker-compose up
时,我看到:
backend_1 | INFO: 127.0.0.1:44856 - "GET /healthcheck HTTP/1.1" 200 OK
backend_1 | INFO: 127.0.0.1:44884 - "GET /healthcheck HTTP/1.1" 200 OK
...并且容器进入“健康”状态:
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
webserver_backend_1 backend-api-image "uvicorn --reload --…" backend 24 seconds ago Up 23 seconds (healthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
如果我使用 docker exec
进入容器并修改 FastAPI 应用程序以返回一个错误,使得代码如下:
backend_1 | WARNING: StatReload detected changes in 'backend.py'. Reloading...
backend_1 | INFO: Shutting down
backend_1 | INFO: Waiting for application shutdown.
backend_1 | INFO: Application shutdown complete.
backend_1 | INFO: Finished server process [8]
backend_1 | INFO: Started server process [1050]
backend_1 | INFO: Waiting for application startup.
backend_1 | INFO: Application startup complete.
backend_1 | INFO: 127.0.0.1:44618 - "GET /healthcheck HTTP/1.1" 400 Bad Request
backend_1 | INFO: 127.0.0.1:48912 - "GET /healthcheck HTTP/1.1" 400 Bad Request
然后容器会进入“不健康”状态:
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
webserver_backend_1 backend-api-image "uvicorn --reload --…" backend 2 minutes ago Up 2 minutes (unhealthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
所有这些似乎按预期工作:随着来自 FastAPI 服务的响应变化,容器的健康状态也会发生变化。
以下是一些问题,以帮助进一步诊断您这边的问题:
-
您的 FastAPI 服务的
Dockerfile
是什么样的?特别是基础镜像是什么? -
您是否验证了该镜像中的
wget
命令是否按预期对来自服务器的非 200 响应返回错误代码?
英文:
In trying to reproduce the behavior you've described, the first problem I ran into is that the standard version of wget
will make HEAD
requests when using the --spider
option, so that your healthcheck results in:
HEAD /healthcheck HTTP/1.1" 405 Method Not Allowed
This is using wget
version 1.21
as installed in the python:3.11
image. I modified the healthcheck to look like this (and dropped the irrelevant parts of your docker-compose.yaml
):
version: '3'
services:
backend:
image: backend-api-image
build:
context: backend
dockerfile: Dockerfile
ports:
- "8080:8080"
volumes:
- './backend:/backend'
healthcheck:
test: wget --no-verbose -O /dev/null --tries=1 http://localhost:8080/healthcheck || exit 1
interval: 1s
timeout: 5s
I have your example FastAPI code in backend/backend.py
, and my backend/Dockerfile
looks like:
FROM python:3.11
WORKDIR /app
RUN python3 -m venv .venv
ENV PATH=/app/.venv/bin:/usr/local/bin:/usr/bin:/bin
COPY requirements.txt ./
RUN . .venv/bin/activate && pip install -r requirements.txt
COPY . ./
CMD ["uvicorn", "--reload", "--host", "0.0.0.0", "--port", "8080", "backend:app"]
When I run docker-compose up
, I see:
backend_1 | INFO: 127.0.0.1:44856 - "GET /healthcheck HTTP/1.1" 200 OK
backend_1 | INFO: 127.0.0.1:44884 - "GET /healthcheck HTTP/1.1" 200 OK
...and the container enters the "healthy" state:
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
webserver_backend_1 backend-api-image "uvicorn --reload --…" backend 24 seconds ago Up 23 seconds (healthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
If I docker exec
into the container and modify the FastAPI application to return an error, so that the code looks like this:
backend_1 | WARNING: StatReload detected changes in 'backend.py'. Reloading...
backend_1 | INFO: Shutting down
backend_1 | INFO: Waiting for application shutdown.
backend_1 | INFO: Application shutdown complete.
backend_1 | INFO: Finished server process [8]
backend_1 | INFO: Started server process [1050]
backend_1 | INFO: Waiting for application startup.
backend_1 | INFO: Application startup complete.
backend_1 | INFO: 127.0.0.1:44618 - "GET /healthcheck HTTP/1.1" 400 Bad Request
backend_1 | INFO: 127.0.0.1:48912 - "GET /healthcheck HTTP/1.1" 400 Bad Request
And the container enters the "unhealthy" state:
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
webserver_backend_1 backend-api-image "uvicorn --reload --…" backend 2 minutes ago Up 2 minutes (unhealthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
That all seems to work as expected: the container health status changes as the response from the FastAPI service changes.
Here are some questions to help further diagnose things on your end:
-
What does the
Dockerfile
for your FastAPI service look like? In particular, what's the base image? -
Have you verified that the
wget
command in that image returns an error code as expected for a non-200 response from the server?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论