2023年7月10日 13:36:53go评论68阅读模式

英文:

envoy: grpc-web takes a few requests until it no longer times out

问题

问题： 当我重新启动 envoy 容器时，需要进行一些请求，直到 grpc-web 调用不再超时。这是在 envoy 容器完全启动后。即使等待更长时间也不能防止这种情况发生（等待了几小时）。在这大约 3 次失败的请求之后，一切正常，直到容器重新启动。

可能的原因： 这个问题可能与 Envoy 的启动过程以及 gRPC 服务的初始化有关。当 Envoy 容器刚启动时，可能存在一些初始化延迟或竞争条件，导致最初的请求失败。以下是可能导致此行为的一些原因：

Envoy 启动时间： 即使 Envoy 容器已经启动并运行，它可能需要一些额外的时间来加载配置、建立连接和准备就绪。在这个过程中，初始请求可能会失败。
gRPC 服务初始化： 如果 gRPC 服务需要一些时间来启动和准备接收请求，那么在 Envoy 启动后立即发送请求可能会导致超时。
连接重用： Envoy 可能在刚启动时不会立即建立连接池，这可能导致初始请求失败。一旦连接池建立并重用，请求就可以正常进行。

解决方案： 要解决这个问题，你可以考虑以下一些可能的解决方案：

延迟启动 gRPC 服务： 如果可能的话，尝试将 gRPC 服务的启动延迟一段时间，以确保 Envoy 完全启动并准备好接收请求。
增加 Envoy 启动时间： 如果 Envoy 启动时间较长，你可以考虑增加容器的启动时间，以确保 Envoy 完全运行。
连接池预热： 考虑配置 Envoy，使其在启动时预热连接池，以确保连接可用性。
使用健康检查： 在 Envoy 中配置健康检查，以确保 gRPC 服务已准备好接收流量，然后再将流量路由到该服务。

这些解决方案中的一个或多个可能有助于减轻你遇到的问题。最好的解决方案取决于你的具体部署和要求。

英文:

Context:<br>
I run envoy with grpc-web. I have a bunch of gRPC servers to route to. Each server has a dedicated route and cluster (see config below). Envoy runs inside a docker-container with no special changes (only config and SSL). Envoy and the gRPC servers are connected via a docker network.

Problem:<br>
Whenever I restart the envoy-container, it takes a few requests until the grpc-web calls go through and don't time out. This is after the envoy-container is 100% started. Leaving it running for longer does not prevent this (left it for hours). After these first ~3 failing requests everything works fine until the container is restarted.

Relevant configs:<br>
I removed any obviously unnecessary config in the docker compose and otherwise condensed the config as much as possible (removed all the repeated parts for each server).

envoy:

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address: { address: 0.0.0.0, port_value: 9901 }
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address: { address: 0.0.0.0, port_value: 8080 }
      listener_filters:
        - name: &quot;envoy.filters.listener.tls_inspector&quot;
          typed_config: { }
      filter_chains:
        # Use HTTPS (TLS) encryption for ingress data
        # Disable this to allow tools like bloomRPC which don&#39;t work via https
        transport_socket:
          name: envoy.transport_socket.tls
          typed_config:
            &quot;@type&quot;: type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
            common_tls_context:
              tls_certificates:
              - certificate_chain:
                  filename: &quot;/etc/envoy/envoy.pem&quot;
                private_key:
                  filename: &quot;/etc/envoy/envoy.key&quot;
        filters:
          - name: envoy.filters.network.http_connection_manager
            typed_config:
              &quot;@type&quot;: type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              codec_type: auto
              stat_prefix: ingress_http
              access_log:
                - name: envoy.access_loggers.file
                  # Logger for gRPC requests (can be identified by the presence of the &quot;x-grpc-web&quot;-header)
                  filter:
                    header_filter:
                      header:
                        name: &quot;x-grpc-web&quot;
                  typed_config:
                    &quot;@type&quot;: type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
                    path: /dev/stdout
                    format: &quot;[%START_TIME%] \&quot;%DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT%\&quot;: \&quot;%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%\&quot; -&gt; \&quot;%UPSTREAM_HOST%\&quot; [gRPC-status: %GRPC_STATUS%] (cluster: %UPSTREAM_CLUSTER% route: %ROUTE_NAME%)\n&quot;
                - name: envoy.access_loggers.file
                  # Logger for HTTP(s) requests (everything that is not a gRPC request)
                  filter:
                    header_filter:
                      header:
                        name: &quot;x-grpc-web&quot;
                        invert_match: true
                  typed_config:
                    &quot;@type&quot;: type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
                    path: /dev/stdout
                    format: &quot;[%START_TIME%] \&quot;%DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT%\&quot;: \&quot;%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%\&quot; -&gt; \&quot;%UPSTREAM_HOST%\&quot; [http(s)-status: %RESPONSE_CODE%] (cluster: %UPSTREAM_CLUSTER% route: %ROUTE_NAME%)\n&quot;
              stream_idle_timeout: 43200s # 12h
              route_config:
                name: local_route
                virtual_hosts:
                  - name: gRPC-Web-Proxy
                    domains: [ &quot;*&quot; ]
                    request_headers_to_add:
                      - header:
                          key: &quot;source&quot;
                          value: &quot;envoy&quot;
                        append: false
                      - header:
                          key: &quot;downstream-address&quot;
                          value: &quot;%DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT%&quot;
                        append: false
                    cors:
                      allow_origin_string_match:
                        - prefix: &quot;*&quot;
                      allow_methods: GET, PUT, DELETE, POST, OPTIONS
                      allow_headers: keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout,x-envoy-retry-grpc-on,x-envoy-max-retries,auth-token,x-real-ip,client-ip,x-forwarded-for,x-forwarded,x-cluster-client-ip,forwarded-for,forwarded
                      max_age: &quot;1728000&quot;
                      expose_headers: grpc-status,grpc-message
                    routes: # https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto
                      - name: grpcserver_gRPCRoute
                        match:
                          prefix: &quot;/api/services.grpcserver&quot;
                        route:
                          cluster: grpcserver_gRPCCluster
                          prefix_rewrite: &quot;/services.grpcserver&quot;
                          timeout: 0s                     # No timeout. Otherwise, streams will be aborted regularly
              http_filters:
                - name: envoy.filters.http.grpc_web
                - name: envoy.filters.http.cors
                - name: envoy.filters.http.router
  clusters:
    - name: grpcserver_gRPCCluster
      connect_timeout: 0.25s
      type: static
      http2_protocol_options: { }
      lb_policy: round_robin
      load_assignment:
        cluster_name: grpcserver_gRPCCluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 127.0.0.1
                      port_value: 20001
      transport_socket:
        # Connect to microservice via TLS
        name: envoy.transport_sockets.tls
        typed_config:
          &quot;@type&quot;: type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          common_tls_context:
            tls_certificates:
              - certificate_chain: { &quot;filename&quot;: &quot;/etc/envoy/envoy.pem&quot; }
                private_key: { &quot;filename&quot;: &quot;/etc/envoy/envoy.key&quot; }
            # Validate CA of microservice
            validation_context:
              match_subject_alt_names:
              trusted_ca:
                filename: /etc/ssl/certs/ca-certificates.crt

docker-compose.yml:

version: &#39;2.4&#39;
networks:
  core:
    name: Service_Core
    driver: bridge
    ipam:
      config:
      - subnet: 198.51.100.0/24
        gateway: 198.51.100.1
services:
  envoy:
    container_name: &quot;envoy&quot;
    image: &quot;envoyproxy/envoy:v1.17.1&quot;
    ports:
      - 8080:8080
    networks:
      - core
    restart: always
    security_opt:
      - apparmor:unconfined
    environment:
      - ENVOY_UID=17200
      - ENVOY_GID=17200
    volumes:
      - &quot;/somepath/envoy.pem:/etc/envoy/envoy.pem:ro&quot;
      - &quot;/somepath/envoy.key:/etc/envoy/envoy.key:ro&quot;
      - &quot;/somepath/ca.pem:/etc/ssl/certs/ca-certificates.crt:ro&quot;
      - &quot;/somepath/envoy.yml:/etc/envoy/envoy.yaml:ro&quot;
  grpcserver:
    image: &quot;&lt;grpcserver&gt;&quot;
    container_name: &quot;grpcserver&quot;
    restart: always
    networks:
      - core
    security_opt:
      - apparmor:unconfined
  frontend:
    image: &quot;&lt;frontend&gt;&quot; # an nginx with the files for the UI
    container_name: &quot;frontend&quot;
    restart: always
    networks:
      - core
    ports:
     - 80:80
     - 443:443
    volumes:
      - &quot;/somepath/ssl/:/opt/ssl/&quot;
    security_opt:
      - apparmor:unconfined

What could be causing this behavior?<br>
I am only interested in a fix regarding the docker or envoy config. I already considered using a workaround, but I would rather fix it instead.

答案1

得分: 1

在您的集群配置中，您指定了一个连接超时时间为250毫秒。
如果服务在此时间内未响应，调用将失败。

似乎第一次对服务的调用无法在这么短的时间内完成，将其设置为较高的值（几秒钟）应该能解决问题。

英文:

In your cluster configuration, you specify a connect-timeout of 250ms.
If the service does not respond within this timeframe, the call will fail.

It seems that the first call to the service isn't able to finish within this short timeframe, setting it to a higher value (a few seconds) should do the trick.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

envoy: grpc-web需要多个请求才能不再超时

问题

答案1

如何调试docker-compose？配置路径在哪里设置？

无法在GCP虚拟机的启动脚本中执行Docker Compose和启动Docker Compose。

如何使用Docker Compose设置Redis、Sentinel和Django+Celery？

Docker 和 Angular 生成阶段 `file does not exist` 错误在构建阶段的COPY命令

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。