英文:
How to debug EC2 DockerCompose crash
问题
我在一个EC2 t2.small实例上运行着一个Docker Compose堆栈。每两周,整个系统都会崩溃。尽管AWS表示服务器正在运行,但我无法连接到SSH。尽管如此,网站却无法访问。
我检查了Docker Compose文件:应该会自动重启。
我检查了磁盘大小,没有问题。
我检查了几个日志,都正常。
由于我无法连接到服务器,当发生这种情况时,我不知道如何调试和找到问题:
这是我的Docker Compose文件:
version: '3'
services:
nginx:
image: ..../nginx-docker:2.3.0
restart: always
container_name: nginx
links:
- ghost
ports:
- "80:80"
- "443:443"
volumes:
- ./logs/nginx:/var/log/nginx/
depends_on:
- ghost
networks:
- app-network
ghost:
image: .../ghost-s3-adapter:5.33.6
restart: always
container_name: ghost
volumes:
- ${MOUNT_POINT}
- ./logs/ghost:/logs/
networks:
- app-network
environment:
url: ${URL}
pim:
image: .../product-api:4.9.29
restart: always
container_name: pim
ports:
- "9090:9090"
volumes:
- ./logs/spring:/logs
- ./uploads:/data/uploads
networks:
- app-network
networks:
app-network:
driver: bridge
英文:
I have a docker compose stack running on a ec2 t2.small. Every other week this whole thing crashes. I cannot connect to ssh though aws says the server is running. Nevertheless the website is down.
I checked the docker compose file: it should restart.
I checked the disk size, there ist no problem
I checked severals logs, all fine
Since I cannot connect to the server, when this is happening I have no idea how to debug and find the problem:
This is my docker compose file:
version: '3'
services:
nginx:
image: ..../nginx-docker:2.3.0
restart: always
container_name: nginx
links:
- ghost
ports:
- "80:80"
- "443:443"
volumes:
- ./logs/nginx:/var/log/nginx/
depends_on:
- ghost
networks:
- app-network
ghost:
image: .../ghost-s3-adapter:5.33.6
restart: always
container_name: ghost
volumes:
- ${MOUNT_POINT}
- ./logs/ghost:/logs/
networks:
- app-network
environment:
url: ${URL}
pim:
image: .../product-api:4.9.29
restart: always
container_name: pim
ports:
- "9090:9090"
volumes:
- ./logs/spring:/logs
- ./uploads:/data/uploads
networks:
- app-network
networks:
app-network:
driver: bridge
答案1
得分: 2
- 检查监控仪表板可能是CPU或OOM(基本监控默认不包括EC2的RAM)在实例本身上,因为它是一个t2.small实例;我建议将其运行在一个t2.medium/large实例上,并检查是否有任何变化。
- 配置并使用UI中的串行连接,以便在SSH失败时使用。
英文:
- Check the monitoring dashboards might be cpu or OOM(the basic monitoring does not have ram by default for ec2) on the box itself since it is a t2.small one; I would run it on a t2.medium/large and check if anything changes
- Configure and use the serial connection from the UI to use it when SSH fails.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论