envoy: grpc-web需要多个请求才能不再超时

huangapple go评论68阅读模式
英文:

envoy: grpc-web takes a few requests until it no longer times out

问题

问题: 当我重新启动 envoy 容器时,需要进行一些请求,直到 grpc-web 调用不再超时。这是在 envoy 容器完全启动后。即使等待更长时间也不能防止这种情况发生(等待了几小时)。在这大约 3 次失败的请求之后,一切正常,直到容器重新启动。

可能的原因: 这个问题可能与 Envoy 的启动过程以及 gRPC 服务的初始化有关。当 Envoy 容器刚启动时,可能存在一些初始化延迟或竞争条件,导致最初的请求失败。以下是可能导致此行为的一些原因:

  1. Envoy 启动时间: 即使 Envoy 容器已经启动并运行,它可能需要一些额外的时间来加载配置、建立连接和准备就绪。在这个过程中,初始请求可能会失败。

  2. gRPC 服务初始化: 如果 gRPC 服务需要一些时间来启动和准备接收请求,那么在 Envoy 启动后立即发送请求可能会导致超时。

  3. 连接重用: Envoy 可能在刚启动时不会立即建立连接池,这可能导致初始请求失败。一旦连接池建立并重用,请求就可以正常进行。

解决方案: 要解决这个问题,你可以考虑以下一些可能的解决方案:

  1. 延迟启动 gRPC 服务: 如果可能的话,尝试将 gRPC 服务的启动延迟一段时间,以确保 Envoy 完全启动并准备好接收请求。

  2. 增加 Envoy 启动时间: 如果 Envoy 启动时间较长,你可以考虑增加容器的启动时间,以确保 Envoy 完全运行。

  3. 连接池预热: 考虑配置 Envoy,使其在启动时预热连接池,以确保连接可用性。

  4. 使用健康检查: 在 Envoy 中配置健康检查,以确保 gRPC 服务已准备好接收流量,然后再将流量路由到该服务。

这些解决方案中的一个或多个可能有助于减轻你遇到的问题。最好的解决方案取决于你的具体部署和要求。

英文:

Context:<br>
I run envoy with grpc-web. I have a bunch of gRPC servers to route to. Each server has a dedicated route and cluster (see config below). Envoy runs inside a docker-container with no special changes (only config and SSL). Envoy and the gRPC servers are connected via a docker network.

Problem:<br>
Whenever I restart the envoy-container, it takes a few requests until the grpc-web calls go through and don't time out. This is after the envoy-container is 100% started. Leaving it running for longer does not prevent this (left it for hours). After these first ~3 failing requests everything works fine until the container is restarted.

Relevant configs:<br>
I removed any obviously unnecessary config in the docker compose and otherwise condensed the config as much as possible (removed all the repeated parts for each server).

envoy:

  1. admin:
  2. access_log_path: /tmp/admin_access.log
  3. address:
  4. socket_address: { address: 0.0.0.0, port_value: 9901 }
  5. static_resources:
  6. listeners:
  7. - name: listener_0
  8. address:
  9. socket_address: { address: 0.0.0.0, port_value: 8080 }
  10. listener_filters:
  11. - name: &quot;envoy.filters.listener.tls_inspector&quot;
  12. typed_config: { }
  13. filter_chains:
  14. # Use HTTPS (TLS) encryption for ingress data
  15. # Disable this to allow tools like bloomRPC which don&#39;t work via https
  16. transport_socket:
  17. name: envoy.transport_socket.tls
  18. typed_config:
  19. &quot;@type&quot;: type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
  20. common_tls_context:
  21. tls_certificates:
  22. - certificate_chain:
  23. filename: &quot;/etc/envoy/envoy.pem&quot;
  24. private_key:
  25. filename: &quot;/etc/envoy/envoy.key&quot;
  26. filters:
  27. - name: envoy.filters.network.http_connection_manager
  28. typed_config:
  29. &quot;@type&quot;: type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
  30. codec_type: auto
  31. stat_prefix: ingress_http
  32. access_log:
  33. - name: envoy.access_loggers.file
  34. # Logger for gRPC requests (can be identified by the presence of the &quot;x-grpc-web&quot;-header)
  35. filter:
  36. header_filter:
  37. header:
  38. name: &quot;x-grpc-web&quot;
  39. typed_config:
  40. &quot;@type&quot;: type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
  41. path: /dev/stdout
  42. format: &quot;[%START_TIME%] \&quot;%DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT%\&quot;: \&quot;%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%\&quot; -&gt; \&quot;%UPSTREAM_HOST%\&quot; [gRPC-status: %GRPC_STATUS%] (cluster: %UPSTREAM_CLUSTER% route: %ROUTE_NAME%)\n&quot;
  43. - name: envoy.access_loggers.file
  44. # Logger for HTTP(s) requests (everything that is not a gRPC request)
  45. filter:
  46. header_filter:
  47. header:
  48. name: &quot;x-grpc-web&quot;
  49. invert_match: true
  50. typed_config:
  51. &quot;@type&quot;: type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
  52. path: /dev/stdout
  53. format: &quot;[%START_TIME%] \&quot;%DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT%\&quot;: \&quot;%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%\&quot; -&gt; \&quot;%UPSTREAM_HOST%\&quot; [http(s)-status: %RESPONSE_CODE%] (cluster: %UPSTREAM_CLUSTER% route: %ROUTE_NAME%)\n&quot;
  54. stream_idle_timeout: 43200s # 12h
  55. route_config:
  56. name: local_route
  57. virtual_hosts:
  58. - name: gRPC-Web-Proxy
  59. domains: [ &quot;*&quot; ]
  60. request_headers_to_add:
  61. - header:
  62. key: &quot;source&quot;
  63. value: &quot;envoy&quot;
  64. append: false
  65. - header:
  66. key: &quot;downstream-address&quot;
  67. value: &quot;%DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT%&quot;
  68. append: false
  69. cors:
  70. allow_origin_string_match:
  71. - prefix: &quot;*&quot;
  72. allow_methods: GET, PUT, DELETE, POST, OPTIONS
  73. allow_headers: keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout,x-envoy-retry-grpc-on,x-envoy-max-retries,auth-token,x-real-ip,client-ip,x-forwarded-for,x-forwarded,x-cluster-client-ip,forwarded-for,forwarded
  74. max_age: &quot;1728000&quot;
  75. expose_headers: grpc-status,grpc-message
  76. routes: # https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto
  77. - name: grpcserver_gRPCRoute
  78. match:
  79. prefix: &quot;/api/services.grpcserver&quot;
  80. route:
  81. cluster: grpcserver_gRPCCluster
  82. prefix_rewrite: &quot;/services.grpcserver&quot;
  83. timeout: 0s # No timeout. Otherwise, streams will be aborted regularly
  84. http_filters:
  85. - name: envoy.filters.http.grpc_web
  86. - name: envoy.filters.http.cors
  87. - name: envoy.filters.http.router
  88. clusters:
  89. - name: grpcserver_gRPCCluster
  90. connect_timeout: 0.25s
  91. type: static
  92. http2_protocol_options: { }
  93. lb_policy: round_robin
  94. load_assignment:
  95. cluster_name: grpcserver_gRPCCluster
  96. endpoints:
  97. - lb_endpoints:
  98. - endpoint:
  99. address:
  100. socket_address:
  101. address: 127.0.0.1
  102. port_value: 20001
  103. transport_socket:
  104. # Connect to microservice via TLS
  105. name: envoy.transport_sockets.tls
  106. typed_config:
  107. &quot;@type&quot;: type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
  108. common_tls_context:
  109. tls_certificates:
  110. - certificate_chain: { &quot;filename&quot;: &quot;/etc/envoy/envoy.pem&quot; }
  111. private_key: { &quot;filename&quot;: &quot;/etc/envoy/envoy.key&quot; }
  112. # Validate CA of microservice
  113. validation_context:
  114. match_subject_alt_names:
  115. trusted_ca:
  116. filename: /etc/ssl/certs/ca-certificates.crt

docker-compose.yml:

  1. version: &#39;2.4&#39;
  2. networks:
  3. core:
  4. name: Service_Core
  5. driver: bridge
  6. ipam:
  7. config:
  8. - subnet: 198.51.100.0/24
  9. gateway: 198.51.100.1
  10. services:
  11. envoy:
  12. container_name: &quot;envoy&quot;
  13. image: &quot;envoyproxy/envoy:v1.17.1&quot;
  14. ports:
  15. - 8080:8080
  16. networks:
  17. - core
  18. restart: always
  19. security_opt:
  20. - apparmor:unconfined
  21. environment:
  22. - ENVOY_UID=17200
  23. - ENVOY_GID=17200
  24. volumes:
  25. - &quot;/somepath/envoy.pem:/etc/envoy/envoy.pem:ro&quot;
  26. - &quot;/somepath/envoy.key:/etc/envoy/envoy.key:ro&quot;
  27. - &quot;/somepath/ca.pem:/etc/ssl/certs/ca-certificates.crt:ro&quot;
  28. - &quot;/somepath/envoy.yml:/etc/envoy/envoy.yaml:ro&quot;
  29. grpcserver:
  30. image: &quot;&lt;grpcserver&gt;&quot;
  31. container_name: &quot;grpcserver&quot;
  32. restart: always
  33. networks:
  34. - core
  35. security_opt:
  36. - apparmor:unconfined
  37. frontend:
  38. image: &quot;&lt;frontend&gt;&quot; # an nginx with the files for the UI
  39. container_name: &quot;frontend&quot;
  40. restart: always
  41. networks:
  42. - core
  43. ports:
  44. - 80:80
  45. - 443:443
  46. volumes:
  47. - &quot;/somepath/ssl/:/opt/ssl/&quot;
  48. security_opt:
  49. - apparmor:unconfined

What could be causing this behavior?<br>
I am only interested in a fix regarding the docker or envoy config. I already considered using a workaround, but I would rather fix it instead.

答案1

得分: 1

在您的集群配置中,您指定了一个连接超时时间为250毫秒。
如果服务在此时间内未响应,调用将失败。

似乎第一次对服务的调用无法在这么短的时间内完成,将其设置为较高的值(几秒钟)应该能解决问题。

英文:

In your cluster configuration, you specify a connect-timeout of 250ms.
If the service does not respond within this timeframe, the call will fail.

It seems that the first call to the service isn't able to finish within this short timeframe, setting it to a higher value (a few seconds) should do the trick.

huangapple
  • 本文由 发表于 2023年7月10日 13:36:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76650884.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定