2015年7月23日 05:10:02go评论159阅读模式

英文:

How can I ensure graceful scaling in kubernetes?

问题

作为在Kubernetes中扩展Pod的一部分，我希望在关闭之前能够优雅地处理我的HTTP连接。为此，我在Go中实现了以下代码：

package main

import (
	"fmt"
	"io"
	"net/http"
	"os"
	"os/signal"
	"syscall"

	"github.com/braintree/manners"
)

func main() {

	shutdown := make(chan int)

	// 创建一个通知通道以进行关闭
	sigChan := make(chan os.Signal, 1)

	// 启动HTTP服务器
	http.HandleFunc("/", hello)
	server := manners.NewWithServer(&http.Server{Addr: ":80", Handler: nil})
	go func() {
		server.ListenAndServe()
		shutdown <- 1
	}()

	// 注册中断（Ctrl+C）和SIGTERM（docker）信号
	signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
	go func() {
		<-sigChan
		fmt.Println("Shutting down...")
		server.Close()
	}()

	<-shutdown
}

func hello(w http.ResponseWriter, r *http.Request) {
	// time.Sleep(3000 * time.Millisecond)
	io.WriteString(w, "Hello world!")
}

这段代码会在处理完现有请求后，响应docker的SIGTERM信号并优雅地关闭。当我在Kubernetes中运行这个容器，并且实例数在10个以内进行扩展和缩减时，一切都正常，只要不缩减到一个实例。当我缩减到一个实例时，我会看到一系列短暂的HTTP错误，然后一切又恢复正常。

我觉得很奇怪，因为在扩展过程中，我会假设代理首先会更新，然后关闭容器，上述代码将允许继续处理请求。

在我的当前设置中，我运行了2个节点，也许问题出现在缩减到节点数以下时，与etcd更新的某种时间相关的问题？对于这里发生的情况有任何见解将非常有用。

英文:

As part of scaling pods in kubernetes I want to ensure I gracefully serve my http connections before shutting down. To that extent I have implemented this code in go:

package main

import (
    &quot;fmt&quot;
	&quot;io&quot;
	&quot;net/http&quot;
	&quot;os&quot;
    &quot;os/signal&quot;
	&quot;syscall&quot;

    &quot;github.com/braintree/manners&quot;
)

func main() {

    shutdown := make(chan int)

    //create a notification channel to shutdown
    sigChan := make(chan os.Signal, 1)

    //start the http server
    http.HandleFunc(&quot;/&quot;, hello)
    server := manners.NewWithServer(&amp;http.Server{Addr: &quot;:80&quot;, Handler: nil})
	go func() {
    	server.ListenAndServe()
    	shutdown &lt;- 1
    }()

    //register for interupt (Ctrl+C) and SIGTERM (docker)
    signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
    go func() {
    	&lt;-sigChan
    	fmt.Println(&quot;Shutting down...&quot;)
    	server.Close()
	}()

	&lt;-shutdown
}

func hello(w http.ResponseWriter, r *http.Request) {
    // time.Sleep(3000 * time.Millisecond)
    io.WriteString(w, &quot;Hello world!&quot;)
}

This looks out for the docker SIGTERM and gracefully shuts down after existing requests have been served. When I run this container in kubernetes with 10 instances I can scale up and down without incident, as long as I don't scale down to a single instance. When I scale to a single instance I see a short set of http errors, then all looks fine again.

I find it strange as in scaling I would assume the proxy is updated first, then containers are shut down and the code above would allow requests to be served out.

In my current setup I am running 2 nodes, maybe the issue is when scaling drops below the number of nodes and there is some sort of timing issue with etcd updates? Any insight into what is going on here would be really useful

答案1

得分: 6

你应该使用一个就绪性检查（http://kubernetes.io/v1.0/docs/user-guide/production-pods.html#liveness-and-readiness-probes-aka-health-checks），在收到 SIGTERM 信号后将 Pod 转换为“未就绪”状态。

一旦这种情况发生，服务将在删除之前将该 Pod 从服务中移除。（如果没有就绪性检查，服务在实际删除之前根本不知道该 Pod 不存在）

你可能还想使用一个 PreStop 钩子，将就绪性设置为 false，并排空所有现有的请求。PreStop 钩子在删除 Pod 之前同步调用，可以在这里找到相关描述：https://kubernetes-v1-4.github.io/docs/user-guide/production-pods/#lifecycle-hooks-and-termination-notice

英文:

You should use a readiness check (http://kubernetes.io/v1.0/docs/user-guide/production-pods.html#liveness-and-readiness-probes-aka-health-checks)

that transitions the Pod to "not ready" after you receive a SIGTERM

Once that happens, the service will remove the Pod from serving, prior to the delete.

(without a readiness check the Service simply doesn't know that the pod doesn't exist, until it is actually deleted)

You may also want to use a PreStop hook that sets readiness to false, and then drains all existing requests. PreStop hooks are called synchronously prior to a Pod being deleted and they are described here:

https://kubernetes-v1-4.github.io/docs/user-guide/production-pods/#lifecycle-hooks-and-termination-notice

答案2

得分: 1

在一个被移除但仍然存活的 pod 中，有一个小的时间窗口，它将成为负载均衡集合的一部分。正如 Brendan 刚刚说的（他比我快了几秒钟），一个就绪检查应该完全解决这个问题，并且完全在你的控制之下。

英文:

There is a small window during which a pod that is being removed but is still alive will be part of the load-balancing set. As Brendan just said (he beat me by seconds), a readiness check should fix this for you completely under your control.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何确保在Kubernetes中实现优雅的扩展？

问题

答案1

答案2

如何将日期字符串转换为时间以在Golang的MySQL驱动程序中插入？

如何使用golang的mgo包进行模糊查询？

Google Cloud Function – 在Golang中找不到匹配的函数名称。

可以使用Go语言向现有的结构体添加字段吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论