英文:
How to debug EC2 DockerCompose crash
问题
我在一个EC2 t2.small实例上运行着一个Docker Compose堆栈。每两周,整个系统都会崩溃。尽管AWS表示服务器正在运行,但我无法连接到SSH。尽管如此,网站却无法访问。
我检查了Docker Compose文件:应该会自动重启。
我检查了磁盘大小,没有问题。
我检查了几个日志,都正常。
由于我无法连接到服务器,当发生这种情况时,我不知道如何调试和找到问题:
这是我的Docker Compose文件:
version: '3'
services:
  nginx:
    image: ..../nginx-docker:2.3.0
    restart: always
    container_name: nginx
    links:
      - ghost
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./logs/nginx:/var/log/nginx/
    depends_on:
      - ghost
    networks:
      - app-network
  ghost:
    image: .../ghost-s3-adapter:5.33.6
    restart: always
    container_name: ghost
    volumes:
      - ${MOUNT_POINT}
      - ./logs/ghost:/logs/
    networks:
      - app-network
    environment:
      url: ${URL}
  pim:
    image: .../product-api:4.9.29
    restart: always
    container_name: pim
    ports:
      - "9090:9090"
    volumes:
      - ./logs/spring:/logs
      - ./uploads:/data/uploads
    networks:
      - app-network
networks:
  app-network:
    driver: bridge
英文:
I have a docker compose stack running on a ec2 t2.small. Every other week this whole thing crashes. I cannot connect to ssh though aws says the server is running. Nevertheless the website is down.
I checked the docker compose file: it should restart.
I checked the disk size, there ist no problem
I checked severals logs, all fine
Since I cannot connect to the server, when this is happening I have no idea how to debug and find the problem:
This is my docker compose file:
version: '3'
services:
  nginx:
    image: ..../nginx-docker:2.3.0
    restart: always
    container_name: nginx
    links:
      - ghost
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./logs/nginx:/var/log/nginx/
    depends_on:
      - ghost
    networks:
      - app-network
  ghost:
    image: .../ghost-s3-adapter:5.33.6
    restart: always
    container_name: ghost
    volumes:
      - ${MOUNT_POINT}
      - ./logs/ghost:/logs/
    networks:
      - app-network
    environment:
      url: ${URL}
  pim:
    image: .../product-api:4.9.29
    restart: always
    container_name: pim
    ports:
      - "9090:9090"
    volumes:
      - ./logs/spring:/logs
      - ./uploads:/data/uploads
    networks:
      - app-network
networks:
  app-network:
    driver: bridge
答案1
得分: 2
- 检查监控仪表板可能是CPU或OOM(基本监控默认不包括EC2的RAM)在实例本身上,因为它是一个t2.small实例;我建议将其运行在一个t2.medium/large实例上,并检查是否有任何变化。
 - 配置并使用UI中的串行连接,以便在SSH失败时使用。
 
英文:
- Check the monitoring dashboards might be cpu or OOM(the basic monitoring does not have ram by default for ec2) on the box itself since it is a t2.small one; I would run it on a t2.medium/large and check if anything changes
 - Configure and use the serial connection from the UI to use it when SSH fails.
 
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论