如何确保在Kubernetes中实现优雅的扩展?

huangapple go评论90阅读模式
英文:

How can I ensure graceful scaling in kubernetes?

问题

作为在Kubernetes中扩展Pod的一部分,我希望在关闭之前能够优雅地处理我的HTTP连接。为此,我在Go中实现了以下代码:

package main

import (
	"fmt"
	"io"
	"net/http"
	"os"
	"os/signal"
	"syscall"

	"github.com/braintree/manners"
)

func main() {

	shutdown := make(chan int)

	// 创建一个通知通道以进行关闭
	sigChan := make(chan os.Signal, 1)

	// 启动HTTP服务器
	http.HandleFunc("/", hello)
	server := manners.NewWithServer(&http.Server{Addr: ":80", Handler: nil})
	go func() {
		server.ListenAndServe()
		shutdown <- 1
	}()

	// 注册中断(Ctrl+C)和SIGTERM(docker)信号
	signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
	go func() {
		<-sigChan
		fmt.Println("Shutting down...")
		server.Close()
	}()

	<-shutdown
}

func hello(w http.ResponseWriter, r *http.Request) {
	// time.Sleep(3000 * time.Millisecond)
	io.WriteString(w, "Hello world!")
}

这段代码会在处理完现有请求后,响应docker的SIGTERM信号并优雅地关闭。当我在Kubernetes中运行这个容器,并且实例数在10个以内进行扩展和缩减时,一切都正常,只要不缩减到一个实例。当我缩减到一个实例时,我会看到一系列短暂的HTTP错误,然后一切又恢复正常。

我觉得很奇怪,因为在扩展过程中,我会假设代理首先会更新,然后关闭容器,上述代码将允许继续处理请求。

在我的当前设置中,我运行了2个节点,也许问题出现在缩减到节点数以下时,与etcd更新的某种时间相关的问题?对于这里发生的情况有任何见解将非常有用。

英文:

As part of scaling pods in kubernetes I want to ensure I gracefully serve my http connections before shutting down. To that extent I have implemented this code in go:

package main

import (
    &quot;fmt&quot;
	&quot;io&quot;
	&quot;net/http&quot;
	&quot;os&quot;
    &quot;os/signal&quot;
	&quot;syscall&quot;

    &quot;github.com/braintree/manners&quot;
)

func main() {

    shutdown := make(chan int)

    //create a notification channel to shutdown
    sigChan := make(chan os.Signal, 1)

    //start the http server
    http.HandleFunc(&quot;/&quot;, hello)
    server := manners.NewWithServer(&amp;http.Server{Addr: &quot;:80&quot;, Handler: nil})
	go func() {
    	server.ListenAndServe()
    	shutdown &lt;- 1
    }()

    //register for interupt (Ctrl+C) and SIGTERM (docker)
    signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
    go func() {
    	&lt;-sigChan
    	fmt.Println(&quot;Shutting down...&quot;)
    	server.Close()
	}()

	&lt;-shutdown
}

func hello(w http.ResponseWriter, r *http.Request) {
    // time.Sleep(3000 * time.Millisecond)
    io.WriteString(w, &quot;Hello world!&quot;)
}

This looks out for the docker SIGTERM and gracefully shuts down after existing requests have been served. When I run this container in kubernetes with 10 instances I can scale up and down without incident, as long as I don't scale down to a single instance. When I scale to a single instance I see a short set of http errors, then all looks fine again.

I find it strange as in scaling I would assume the proxy is updated first, then containers are shut down and the code above would allow requests to be served out.

In my current setup I am running 2 nodes, maybe the issue is when scaling drops below the number of nodes and there is some sort of timing issue with etcd updates? Any insight into what is going on here would be really useful

答案1

得分: 6

你应该使用一个就绪性检查(http://kubernetes.io/v1.0/docs/user-guide/production-pods.html#liveness-and-readiness-probes-aka-health-checks),在收到 SIGTERM 信号后将 Pod 转换为“未就绪”状态。

一旦这种情况发生,服务将在删除之前将该 Pod 从服务中移除。(如果没有就绪性检查,服务在实际删除之前根本不知道该 Pod 不存在)

你可能还想使用一个 PreStop 钩子,将就绪性设置为 false,并排空所有现有的请求。PreStop 钩子在删除 Pod 之前同步调用,可以在这里找到相关描述:https://kubernetes-v1-4.github.io/docs/user-guide/production-pods/#lifecycle-hooks-and-termination-notice

英文:

You should use a readiness check (http://kubernetes.io/v1.0/docs/user-guide/production-pods.html#liveness-and-readiness-probes-aka-health-checks)

that transitions the Pod to "not ready" after you receive a SIGTERM

Once that happens, the service will remove the Pod from serving, prior to the delete.

(without a readiness check the Service simply doesn't know that the pod doesn't exist, until it is actually deleted)

You may also want to use a PreStop hook that sets readiness to false, and then drains all existing requests. PreStop hooks are called synchronously prior to a Pod being deleted and they are described here:

https://kubernetes-v1-4.github.io/docs/user-guide/production-pods/#lifecycle-hooks-and-termination-notice

答案2

得分: 1

在一个被移除但仍然存活的 pod 中,有一个小的时间窗口,它将成为负载均衡集合的一部分。正如 Brendan 刚刚说的(他比我快了几秒钟),一个就绪检查应该完全解决这个问题,并且完全在你的控制之下。

英文:

There is a small window during which a pod that is being removed but is still alive will be part of the load-balancing set. As Brendan just said (he beat me by seconds), a readiness check should fix this for you completely under your control.

huangapple
  • 本文由 发表于 2015年7月23日 05:10:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/31574038.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定