问题

我在一个EC2 t2.small实例上运行着一个Docker Compose堆栈。每两周，整个系统都会崩溃。尽管AWS表示服务器正在运行，但我无法连接到SSH。尽管如此，网站却无法访问。

我检查了Docker Compose文件：应该会自动重启。
我检查了磁盘大小，没有问题。
我检查了几个日志，都正常。

由于我无法连接到服务器，当发生这种情况时，我不知道如何调试和找到问题：

这是我的Docker Compose文件：

version: '3'
services:
  nginx:
    image: ..../nginx-docker:2.3.0
    restart: always
    container_name: nginx
    links:
      - ghost
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./logs/nginx:/var/log/nginx/
    depends_on:
      - ghost
    networks:
      - app-network
  ghost:
    image: .../ghost-s3-adapter:5.33.6
    restart: always
    container_name: ghost
    volumes:
      - ${MOUNT_POINT}
      - ./logs/ghost:/logs/
    networks:
      - app-network
    environment:
      url: ${URL}
  pim:
    image: .../product-api:4.9.29
    restart: always
    container_name: pim
    ports:
      - "9090:9090"
    volumes:
      - ./logs/spring:/logs
      - ./uploads:/data/uploads
    networks:
      - app-network
networks:
  app-network:
    driver: bridge

英文:

I have a docker compose stack running on a ec2 t2.small. Every other week this whole thing crashes. I cannot connect to ssh though aws says the server is running. Nevertheless the website is down.

I checked the docker compose file: it should restart.
I checked the disk size, there ist no problem
I checked severals logs, all fine

Since I cannot connect to the server, when this is happening I have no idea how to debug and find the problem:

This is my docker compose file:

version: &#39;3&#39;
services:
  nginx:
    image: ..../nginx-docker:2.3.0
    restart: always
    container_name: nginx
    links:
      - ghost
    ports:
      - &quot;80:80&quot;
      - &quot;443:443&quot;
    volumes:
      - ./logs/nginx:/var/log/nginx/
    depends_on:
      - ghost
    networks:
      - app-network
  ghost:
    image: .../ghost-s3-adapter:5.33.6
    restart: always
    container_name: ghost
    volumes:
      - ${MOUNT_POINT}
      - ./logs/ghost:/logs/
    networks:
      - app-network
    environment:
      url: ${URL}
  pim:
    image: .../product-api:4.9.29
    restart: always
    container_name: pim
    ports:
      - &quot;9090:9090&quot;
    volumes:
      - ./logs/spring:/logs
      - ./uploads:/data/uploads
    networks:
      - app-network
networks:
  app-network:
    driver: bridge

答案1

得分: 2

检查监控仪表板可能是CPU或OOM（基本监控默认不包括EC2的RAM）在实例本身上，因为它是一个t2.small实例；我建议将其运行在一个t2.medium/large实例上，并检查是否有任何变化。
配置并使用UI中的串行连接，以便在SSH失败时使用。

英文:

Check the monitoring dashboards might be cpu or OOM(the basic monitoring does not have ram by default for ec2) on the box itself since it is a t2.small one; I would run it on a t2.medium/large and check if anything changes
Configure and use the serial connection from the UI to use it when SSH fails.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何调试 EC2 DockerCompose 崩溃

问题

答案1

Spring Boot 在 Kubernetes 中的 Pod 未启动。

我在将我的Node.js项目进行Docker化时遇到了错误。

给定一个S3路径和有效的密钥和秘钥，我如何更新对象的缓存控制头？

如何从存储桶中获取一个对象？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。