如何调试 EC2 DockerCompose 崩溃

huangapple go评论65阅读模式
英文:

How to debug EC2 DockerCompose crash

问题

我在一个EC2 t2.small实例上运行着一个Docker Compose堆栈。每两周,整个系统都会崩溃。尽管AWS表示服务器正在运行,但我无法连接到SSH。尽管如此,网站却无法访问。

我检查了Docker Compose文件:应该会自动重启。
我检查了磁盘大小,没有问题。
我检查了几个日志,都正常。

由于我无法连接到服务器,当发生这种情况时,我不知道如何调试和找到问题:

这是我的Docker Compose文件:

version: '3'

services:
  nginx:
    image: ..../nginx-docker:2.3.0
    restart: always
    container_name: nginx
    links:
      - ghost
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./logs/nginx:/var/log/nginx/
    depends_on:
      - ghost
    networks:
      - app-network
  ghost:
    image: .../ghost-s3-adapter:5.33.6
    restart: always
    container_name: ghost
    volumes:
      - ${MOUNT_POINT}
      - ./logs/ghost:/logs/
    networks:
      - app-network
    environment:
      url: ${URL}
  pim:
    image: .../product-api:4.9.29
    restart: always
    container_name: pim
    ports:
      - "9090:9090"
    volumes:
      - ./logs/spring:/logs
      - ./uploads:/data/uploads
    networks:
      - app-network
networks:
  app-network:
    driver: bridge
英文:

I have a docker compose stack running on a ec2 t2.small. Every other week this whole thing crashes. I cannot connect to ssh though aws says the server is running. Nevertheless the website is down.

I checked the docker compose file: it should restart.
I checked the disk size, there ist no problem
I checked severals logs, all fine

Since I cannot connect to the server, when this is happening I have no idea how to debug and find the problem:

This is my docker compose file:

version: '3'

services:
  nginx:
    image: ..../nginx-docker:2.3.0
    restart: always
    container_name: nginx
    links:
      - ghost
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./logs/nginx:/var/log/nginx/
    depends_on:
      - ghost
    networks:
      - app-network
  ghost:
    image: .../ghost-s3-adapter:5.33.6
    restart: always
    container_name: ghost
    volumes:
      - ${MOUNT_POINT}
      - ./logs/ghost:/logs/
    networks:
      - app-network
    environment:
      url: ${URL}
  pim:
    image: .../product-api:4.9.29
    restart: always
    container_name: pim
    ports:
      - "9090:9090"
    volumes:
      - ./logs/spring:/logs
      - ./uploads:/data/uploads
    networks:
      - app-network
networks:
  app-network:
    driver: bridge

答案1

得分: 2

  • 检查监控仪表板可能是CPU或OOM(基本监控默认不包括EC2的RAM)在实例本身上,因为它是一个t2.small实例;我建议将其运行在一个t2.medium/large实例上,并检查是否有任何变化。
  • 配置并使用UI中的串行连接,以便在SSH失败时使用。
英文:
  • Check the monitoring dashboards might be cpu or OOM(the basic monitoring does not have ram by default for ec2) on the box itself since it is a t2.small one; I would run it on a t2.medium/large and check if anything changes
  • Configure and use the serial connection from the UI to use it when SSH fails.

huangapple
  • 本文由 发表于 2023年6月29日 18:33:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76580212.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定