envoy: grpc-web需要多个请求才能不再超时

huangapple go评论46阅读模式
英文:

envoy: grpc-web takes a few requests until it no longer times out

问题

问题: 当我重新启动 envoy 容器时,需要进行一些请求,直到 grpc-web 调用不再超时。这是在 envoy 容器完全启动后。即使等待更长时间也不能防止这种情况发生(等待了几小时)。在这大约 3 次失败的请求之后,一切正常,直到容器重新启动。

可能的原因: 这个问题可能与 Envoy 的启动过程以及 gRPC 服务的初始化有关。当 Envoy 容器刚启动时,可能存在一些初始化延迟或竞争条件,导致最初的请求失败。以下是可能导致此行为的一些原因:

  1. Envoy 启动时间: 即使 Envoy 容器已经启动并运行,它可能需要一些额外的时间来加载配置、建立连接和准备就绪。在这个过程中,初始请求可能会失败。

  2. gRPC 服务初始化: 如果 gRPC 服务需要一些时间来启动和准备接收请求,那么在 Envoy 启动后立即发送请求可能会导致超时。

  3. 连接重用: Envoy 可能在刚启动时不会立即建立连接池,这可能导致初始请求失败。一旦连接池建立并重用,请求就可以正常进行。

解决方案: 要解决这个问题,你可以考虑以下一些可能的解决方案:

  1. 延迟启动 gRPC 服务: 如果可能的话,尝试将 gRPC 服务的启动延迟一段时间,以确保 Envoy 完全启动并准备好接收请求。

  2. 增加 Envoy 启动时间: 如果 Envoy 启动时间较长,你可以考虑增加容器的启动时间,以确保 Envoy 完全运行。

  3. 连接池预热: 考虑配置 Envoy,使其在启动时预热连接池,以确保连接可用性。

  4. 使用健康检查: 在 Envoy 中配置健康检查,以确保 gRPC 服务已准备好接收流量,然后再将流量路由到该服务。

这些解决方案中的一个或多个可能有助于减轻你遇到的问题。最好的解决方案取决于你的具体部署和要求。

英文:

Context:<br>
I run envoy with grpc-web. I have a bunch of gRPC servers to route to. Each server has a dedicated route and cluster (see config below). Envoy runs inside a docker-container with no special changes (only config and SSL). Envoy and the gRPC servers are connected via a docker network.

Problem:<br>
Whenever I restart the envoy-container, it takes a few requests until the grpc-web calls go through and don't time out. This is after the envoy-container is 100% started. Leaving it running for longer does not prevent this (left it for hours). After these first ~3 failing requests everything works fine until the container is restarted.

Relevant configs:<br>
I removed any obviously unnecessary config in the docker compose and otherwise condensed the config as much as possible (removed all the repeated parts for each server).

envoy:

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address: { address: 0.0.0.0, port_value: 9901 }
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address: { address: 0.0.0.0, port_value: 8080 }
      listener_filters:
        - name: &quot;envoy.filters.listener.tls_inspector&quot;
          typed_config: { }
      filter_chains:
        # Use HTTPS (TLS) encryption for ingress data
        # Disable this to allow tools like bloomRPC which don&#39;t work via https
        transport_socket:
          name: envoy.transport_socket.tls
          typed_config:
            &quot;@type&quot;: type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
            common_tls_context:
              tls_certificates:
              - certificate_chain:
                  filename: &quot;/etc/envoy/envoy.pem&quot;
                private_key:
                  filename: &quot;/etc/envoy/envoy.key&quot;
        filters:
          - name: envoy.filters.network.http_connection_manager
            typed_config:
              &quot;@type&quot;: type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              codec_type: auto
              stat_prefix: ingress_http
              access_log:
                - name: envoy.access_loggers.file
                  # Logger for gRPC requests (can be identified by the presence of the &quot;x-grpc-web&quot;-header)
                  filter:
                    header_filter:
                      header:
                        name: &quot;x-grpc-web&quot;
                  typed_config:
                    &quot;@type&quot;: type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
                    path: /dev/stdout
                    format: &quot;[%START_TIME%] \&quot;%DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT%\&quot;: \&quot;%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%\&quot; -&gt; \&quot;%UPSTREAM_HOST%\&quot; [gRPC-status: %GRPC_STATUS%] (cluster: %UPSTREAM_CLUSTER% route: %ROUTE_NAME%)\n&quot;
                - name: envoy.access_loggers.file
                  # Logger for HTTP(s) requests (everything that is not a gRPC request)
                  filter:
                    header_filter:
                      header:
                        name: &quot;x-grpc-web&quot;
                        invert_match: true
                  typed_config:
                    &quot;@type&quot;: type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
                    path: /dev/stdout
                    format: &quot;[%START_TIME%] \&quot;%DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT%\&quot;: \&quot;%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%\&quot; -&gt; \&quot;%UPSTREAM_HOST%\&quot; [http(s)-status: %RESPONSE_CODE%] (cluster: %UPSTREAM_CLUSTER% route: %ROUTE_NAME%)\n&quot;
              stream_idle_timeout: 43200s # 12h
              route_config:
                name: local_route
                virtual_hosts:
                  - name: gRPC-Web-Proxy
                    domains: [ &quot;*&quot; ]
                    request_headers_to_add:
                      - header:
                          key: &quot;source&quot;
                          value: &quot;envoy&quot;
                        append: false
                      - header:
                          key: &quot;downstream-address&quot;
                          value: &quot;%DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT%&quot;
                        append: false
                    cors:
                      allow_origin_string_match:
                        - prefix: &quot;*&quot;
                      allow_methods: GET, PUT, DELETE, POST, OPTIONS
                      allow_headers: keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout,x-envoy-retry-grpc-on,x-envoy-max-retries,auth-token,x-real-ip,client-ip,x-forwarded-for,x-forwarded,x-cluster-client-ip,forwarded-for,forwarded
                      max_age: &quot;1728000&quot;
                      expose_headers: grpc-status,grpc-message
                    routes: # https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto
                      - name: grpcserver_gRPCRoute
                        match:
                          prefix: &quot;/api/services.grpcserver&quot;
                        route:
                          cluster: grpcserver_gRPCCluster
                          prefix_rewrite: &quot;/services.grpcserver&quot;
                          timeout: 0s                     # No timeout. Otherwise, streams will be aborted regularly
              http_filters:
                - name: envoy.filters.http.grpc_web
                - name: envoy.filters.http.cors
                - name: envoy.filters.http.router
  clusters:
    - name: grpcserver_gRPCCluster
      connect_timeout: 0.25s
      type: static
      http2_protocol_options: { }
      lb_policy: round_robin
      load_assignment:
        cluster_name: grpcserver_gRPCCluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 127.0.0.1
                      port_value: 20001
      transport_socket:
        # Connect to microservice via TLS
        name: envoy.transport_sockets.tls
        typed_config:
          &quot;@type&quot;: type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          common_tls_context:
            tls_certificates:
              - certificate_chain: { &quot;filename&quot;: &quot;/etc/envoy/envoy.pem&quot; }
                private_key: { &quot;filename&quot;: &quot;/etc/envoy/envoy.key&quot; }
            # Validate CA of microservice
            validation_context:
              match_subject_alt_names:
              trusted_ca:
                filename: /etc/ssl/certs/ca-certificates.crt

docker-compose.yml:

version: &#39;2.4&#39;
networks:
  core:
    name: Service_Core
    driver: bridge
    ipam:
      config:
      - subnet: 198.51.100.0/24
        gateway: 198.51.100.1
services:
  envoy:
    container_name: &quot;envoy&quot;
    image: &quot;envoyproxy/envoy:v1.17.1&quot;
    ports:
      - 8080:8080
    networks:
      - core
    restart: always
    security_opt:
      - apparmor:unconfined
    environment:
      - ENVOY_UID=17200
      - ENVOY_GID=17200
    volumes:
      - &quot;/somepath/envoy.pem:/etc/envoy/envoy.pem:ro&quot;
      - &quot;/somepath/envoy.key:/etc/envoy/envoy.key:ro&quot;
      - &quot;/somepath/ca.pem:/etc/ssl/certs/ca-certificates.crt:ro&quot;
      - &quot;/somepath/envoy.yml:/etc/envoy/envoy.yaml:ro&quot;

  grpcserver:
    image: &quot;&lt;grpcserver&gt;&quot;
    container_name: &quot;grpcserver&quot;
    restart: always
    networks:
      - core
    security_opt:
      - apparmor:unconfined

  frontend:
    image: &quot;&lt;frontend&gt;&quot; # an nginx with the files for the UI
    container_name: &quot;frontend&quot;
    restart: always
    networks:
      - core
    ports:
     - 80:80
     - 443:443
    volumes:
      - &quot;/somepath/ssl/:/opt/ssl/&quot;
    security_opt:
      - apparmor:unconfined

What could be causing this behavior?<br>
I am only interested in a fix regarding the docker or envoy config. I already considered using a workaround, but I would rather fix it instead.

答案1

得分: 1

在您的集群配置中,您指定了一个连接超时时间为250毫秒。
如果服务在此时间内未响应,调用将失败。

似乎第一次对服务的调用无法在这么短的时间内完成,将其设置为较高的值(几秒钟)应该能解决问题。

英文:

In your cluster configuration, you specify a connect-timeout of 250ms.
If the service does not respond within this timeframe, the call will fail.

It seems that the first call to the service isn't able to finish within this short timeframe, setting it to a higher value (a few seconds) should do the trick.

huangapple
  • 本文由 发表于 2023年7月10日 13:36:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76650884.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定