Ways of optimizing a CPU Intensive Golang WebApp

huangapple go评论119阅读模式
英文:

Ways of optimizing a CPU Intensive Golang WebApp

问题

我有一个非常消耗CPU的玩具Web应用程序。

func PerfServiceHandler(w http.ResponseWriter, req *http.Request) {
   start := time.Now()
   w.Header().Set("Content-Type", "application/json")

   x := 0
   for i := 0; i < 200000000; i++ {
       x = x + 1
       x = x - 1
    }
    elapsed := time.Since(start)    
    w.Write([]byte(fmt.Sprintf("Time Elapsed %s", elapsed)))
}

func main() {
    http.HandleFunc("/perf", PerfServiceHandler)
    http.ListenAndServe(":3000", nil)
}

上述函数执行大约需要120毫秒。但是当我使用500个并发用户进行负载测试(siege -t30s -i -v -c500 http://localhost:3000/perf),我得到以下结果:

  • 平均响应时间为2.51秒
  • 事务速率为每秒160.57个事务

有人可以回答我以下问题吗:

  • 当我使用100、200、500个并发用户运行时,我发现上述应用程序使用的操作系统线程数从刚开始的7个变为了35个,并且无论增加多少并发连接,这个数字都不会改变。即使有500个并发请求到达服务器,操作系统线程的数量仍然保持在35个(该应用程序是使用runtime.GOMAXPROCS(runtime.NumCPU())启动的)。当测试停止时,这个数字仍然是35。
    • 有人可以解释一下这种行为吗?
    • 是否可以从操作系统或Go语言中以某种方式增加操作系统线程的数量?
    • 如果增加操作系统线程的数量,是否会提高性能?
  • 有人可以提供一些优化此应用程序的其他方法吗?

环境:

  • Go - go1.4.1 linux/amd64
  • 操作系统 - Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux
  • 处理器 - 2.6Ghz (Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz)
  • 内存 - 64 GB

操作系统参数:

  • nproc - 32
  • cat /proc/sys/kernel/threads-max - 1031126
  • ulimit -u - 515563
  • ulimit -a
    • core file size (blocks, -c) 0
    • data seg size (kbytes, -d) unlimited
    • scheduling priority (-e) 0
    • file size (blocks, -f) unlimited
    • pending signals (-i) 515563
    • max locked memory (kbytes, -l) 64
    • max memory size (kbytes, -m) unlimited
    • open files (-n) 65536
    • pipe size (512 bytes, -p) 8
    • POSIX message queues (bytes, -q) 819200
    • real-time priority (-r) 0
    • stack size (kbytes, -s) 8192
    • cpu time (seconds, -t) unlimited
    • max user processes (-u) 515563
    • virtual memory (kbytes, -v) unlimited
    • file locks (-x) unlimited
英文:

I have a toy web app which is very cpu intensive

func PerfServiceHandler(w http.ResponseWriter, req *http.Request) 
{
   start := time.Now()
   w.Header().Set(&quot;Content-Type&quot;, &quot;application/json&quot;)

   x := 0
   for i := 0; i &lt; 200000000; i++ {
       x = x + 1
       x = x - 1
    }
    elapsed := time.Since(start)    
    w.Write([]byte(fmt.Sprintf(&quot;Time Elapsed %s&quot;, elapsed)))
}

func main() 
{
    http.HandleFunc(&quot;/perf&quot;, PerfServiceHandler)
    http.ListenAndServe(&quot;:3000&quot;, nil)
}

The above function takes about 120 ms to execute. But when I do a load test this app with 500 concurrent users(siege -t30s -i -v -c500 http://localhost:3000/perf) the results I got

  • Average Resp Time per request 2.51 secs
  • Transaction Rate 160.57 transactions per second

Can someone answer my queries below:-

  • When I ran with 100, 200, 500 concurrent users I saw the no. of OS threads used by the above app got stuck to 35 from 7 when the app was just started. Increasing the no.of concurrent connection does not change this number. Even when 500 concurrent requests arrive at the server the number of OS threads were still stuck at 35 OS threads (The app was started with runtime.GOMAXPROCS(runtime.NumCPU())). When the test stopped the number was still 35.
    • Can someone explain me this behaviour?
    • Can the no. of OS threads be increased somehow (from OS or from GOlang)?
    • Will this improve the performance if no. of OS threads are increased?
  • Can someone suggest some other ways of optimizing this app?

Environment:-

Go - go1.4.1 linux/amd64
OS - Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux
Processor - 2.6Ghz (Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz)
RAM - 64 GB

OS Parameters -

nproc - 32
cat /proc/sys/kernel/threads-max - 1031126
ulimit -u - 515563
ulimit -a
    core file size          (blocks, -c) 0
    data seg size           (kbytes, -d) unlimited
    scheduling priority             (-e) 0
    file size               (blocks, -f) unlimited
    pending signals                 (-i) 515563
    max locked memory       (kbytes, -l) 64
    max memory size         (kbytes, -m) unlimited
    open files                      (-n) 65536
    pipe size            (512 bytes, -p) 8
    POSIX message queues     (bytes, -q) 819200
    real-time priority              (-r) 0
    stack size              (kbytes, -s) 8192
    cpu time               (seconds, -t) unlimited
    max user processes              (-u) 515563
    virtual memory          (kbytes, -v) unlimited
    file locks                      (-x) unlimited

答案1

得分: 4

多个goroutine可以对应一个操作系统线程。设计在这里描述:https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit,引用了这篇论文:http://supertech.csail.mit.edu/papers/steal.pdf。

接下来是问题:

即使有500个并发请求到达服务器,操作系统线程的数量仍然停留在35个操作系统线程......有人能解释一下这种行为吗?

由于您将GOMAXPROCS设置为CPU的数量,因此Go只会同时运行这么多个goroutine。

可能有一点令人困惑的是,goroutine并不总是在运行(有时它们是“忙碌”的)。例如,如果您读取一个文件,当操作系统正在进行读取操作时,goroutine是忙碌的,调度器会选择另一个goroutine来运行(如果有的话)。一旦文件读取完成,该goroutine将返回到“可运行”goroutine列表中。

操作系统级线程的创建由调度器处理,并且在系统级调用周围存在其他复杂性。(有时您需要一个真正的专用线程。请参见:LockOSThread)但您不应该期望有大量线程。

是否可以通过某种方式增加操作系统线程的数量(来自操作系统或Go语言)?

我认为使用LockOSThread可能会导致创建新的线程,但这并不重要:

如果增加操作系统线程的数量,这会提高性能吗?

不会。您的CPU在本质上受限于同时执行的任务数量。Goroutine之所以有效,是因为事实证明大多数操作在某种程度上都受到IO限制,但如果您真的在执行CPU密集型任务,增加线程数量并不会有所帮助。实际上,这可能会使情况变得更糟,因为在线程之间切换涉及一些开销。

换句话说,Go在这里做出了正确的决策。

有人能提出一些优化此应用程序的其他方法吗?

for i := 0; i < 200000000; i++ {
   x = x + 1
   x = x - 1
}

我猜您编写这段代码只是为了让CPU做大量的工作?实际代码是什么样的?

您最好的选择是找到一种优化该代码以减少CPU时间的方法。如果这不可能(已经高度优化),那么您将需要添加更多的计算机/ CPU。获得更好的计算机,或者更多的计算机。

对于多台计算机,您可以在所有机器前面放置一个负载均衡器,这样应该很容易扩展。

将此工作从Web服务器中分离出来并将其移动到某个后端系统中可能也会有所好处。考虑使用工作队列。

英文:

Multiple goroutines can correspond to a single os thread. The design is described here: https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit, which references this paper: http://supertech.csail.mit.edu/papers/steal.pdf.

On to the questions:

> Even when 500 concurrent requests arrive at the server the number of OS threads were still stuck at 35 OS threads [...] Can someone explain me this behaviour?

Since you set GOMAXPROCS to the # of CPUs go will only run that many goroutines at a time.

One thing that may be a little confusing is that goroutines aren't always running (sometimes they are "busy"). For example if you read a file, while the OS is doing that work the goroutine is busy and the scheduler will pick up another goroutine to run (assuming there is one). Once the file read is complete that goroutine goes back into the list of "runnable" goroutines.

The creation of OS level threads is handled by the scheduler and there are additional complexities around system-level calls. (Sometimes you need a real, dedicated thread. See: LockOSThread) But you shouldn't expect a ton of threads.

> Can the no. of OS threads be increased somehow (from OS or from GOlang)?

I think using LockOSThread may result in the creation of new threads, but it won't matter:

> Will this improve the performance if no. of OS threads are increased?

No. Your CPU is fundamentally limited in how many things it can do at once. Goroutines work because it turns out most operations are IO bound in some way, but if you are truly doing something CPU bound, throwing more threads at the problem won't help. In fact it will probably make it worse, since there is overhead involved in switching between threads.

In other words Go is making the right decision here.

> Can someone suggest some other ways of optimizing this app?

for i := 0; i &lt; 200000000; i++ {
   x = x + 1
   x = x - 1
}

I take it you wrote this code just to make the CPU do a lot of work? What does the actual code look like?

Your best bet will be finding a way to optimize that code so it needs less CPU time. If that's not possible (its already highly optimized), then you will need to add more computers / CPUs to the mix. Get a better computer, or more of them.

For multiple computers you can put a load balancer in front of all your machines and that should scale pretty easily.

You may also benefit by pulling this work off of the webserver and moving it to some backend system. Consider using a work queue.

huangapple
  • 本文由 发表于 2015年5月6日 04:30:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/30062703.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定