2013年8月2日 07:48:34go评论80阅读模式

英文:

Go routine performance maximizing

问题

我正在用Go语言编写一个数据迁移程序。将位于一个数据中心的数据移动到另一个数据中心。考虑到Go协程的特性，我认为Go非常适合这个任务。

我注意到，如果我只运行一个程序，使用1800个线程传输的数据量非常低。

以下是在30秒内平均输出的dstat结果：

---load-avg--- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
 1m   5m  15m |usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
0.70 3.58 4.42| 10   1  89   0   0   0|   0   156k|7306k 6667k|   0     0 |  11k 6287 
0.61 3.28 4.29| 12   2  85   0   0   1|   0  6963B|8822k 8523k|   0     0 |  14k 7531 
0.65 3.03 4.18| 12   2  86   0   0   1|   0  1775B|8660k 8514k|   0     0 |  13k 7464 
0.67 2.81 4.07| 12   2  86   0   0   1|   0  1638B|8908k 8735k|   0     0 |  13k 7435 
0.67 2.60 3.96| 12   2  86   0   0   1|   0   819B|8752k 8385k|   0     0 |  13k 7445 
0.47 2.37 3.84| 11   2  86   0   0   1|   0  2185B|8740k 8491k|   0     0 |  13k 7548 
0.61 2.22 3.74| 10   2  88   0   0   0|   0  1229B|7122k 6765k|   0     0 |  11k 6228 
0.52 2.04 3.63|  3   1  97   0   0   0|   0   546B|1999k 1365k|   0     0 |3117  2033

如果我运行9个实例，每个实例有200个线程，性能会好得多。

以下是运行9个实例的程序的输出结果：

---load-avg--- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
 1m   5m  15m |usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
8.34 9.56 8.78| 53   8  36   0   0   3|   0   410B|  38M   32M|   0     0 |  41k   26k
8.01 9.37 8.74| 74  10  12   0   0   4|   0   137B|  51M   51M|   0     0 |  59k   39k
8.36 9.31 8.74| 75   9  12   0   0   4|   0  1092B|  51M   51M|   0     0 |  59k   39k
6.93 8.89 8.62| 74  10  12   0   0   4|   0  5188B|  50M   49M|   0     0 |  59k   38k
7.09 8.73 8.58| 75   9  12   0   0   4|   0   410B|  51M   50M|   0     0 |  60k   39k
7.40 8.62 8.54| 75   9  12   0   0   4|   0   137B|  52M   49M|   0     0 |  61k   40k
7.96 8.63 8.55| 75   9  12   0   0   4|   0   956B|  51M   51M|   0     0 |  59k   39k
7.46 8.44 8.49| 75   9  12   0   0   4|   0   273B|  51M   50M|   0     0 |  58k   38k
8.08 8.51 8.51| 75   9  12   0   0   4|   0   410B|  51M   51M|   0     0 |  59k   39k

负载平均值有点高，但我稍后会处理。然而，网络流量几乎达到了网络的潜力。

我使用的是Ubuntu 12.04操作系统，
8GB内存，
2.3GHz处理器（来自EC2）。

此外，我将文件描述符从1024增加到10240。

我以为Go专为这种任务设计，或者我对Go对于这个应用程序期望过高了吗？

我是否遗漏了一些微不足道的东西？我需要配置系统以最大化Go的潜力吗？

编辑

我想我的问题表达得不够清楚。抱歉。我并不期望Go能够实现魔法，我知道计算机在处理能力上有限制。
所以我重新表达一下。为什么一个拥有1800个Go协程的实例与拥有每个实例200个线程的9个实例相比，性能相差如此之大？同样数量的Go协程，但是一个实例的性能明显低于9个实例。

英文:

I writing a data mover in go. Taking data located in one data center and moving it to another data center. Figured go would be perfect for this given the go routines.

I notice if I have one program running 1800 threads the amount of data being transmitted is really low

here's the dstat print out averaged over 30 seconds

---load-avg--- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
 1m   5m  15m |usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
0.70 3.58 4.42| 10   1  89   0   0   0|   0   156k|7306k 6667k|   0     0 |  11k 6287 
0.61 3.28 4.29| 12   2  85   0   0   1|   0  6963B|8822k 8523k|   0     0 |  14k 7531 
0.65 3.03 4.18| 12   2  86   0   0   1|   0  1775B|8660k 8514k|   0     0 |  13k 7464 
0.67 2.81 4.07| 12   2  86   0   0   1|   0  1638B|8908k 8735k|   0     0 |  13k 7435 
0.67 2.60 3.96| 12   2  86   0   0   1|   0   819B|8752k 8385k|   0     0 |  13k 7445 
0.47 2.37 3.84| 11   2  86   0   0   1|   0  2185B|8740k 8491k|   0     0 |  13k 7548 
0.61 2.22 3.74| 10   2  88   0   0   0|   0  1229B|7122k 6765k|   0     0 |  11k 6228 
0.52 2.04 3.63|  3   1  97   0   0   0|   0   546B|1999k 1365k|   0     0 |3117  2033

If I run 9 instances of the program with 200 threads each I see much better performance

---load-avg--- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
 1m   5m  15m |usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
8.34 9.56 8.78| 53   8  36   0   0   3|   0   410B|  38M   32M|   0     0 |  41k   26k
8.01 9.37 8.74| 74  10  12   0   0   4|   0   137B|  51M   51M|   0     0 |  59k   39k
8.36 9.31 8.74| 75   9  12   0   0   4|   0  1092B|  51M   51M|   0     0 |  59k   39k
6.93 8.89 8.62| 74  10  12   0   0   4|   0  5188B|  50M   49M|   0     0 |  59k   38k
7.09 8.73 8.58| 75   9  12   0   0   4|   0   410B|  51M   50M|   0     0 |  60k   39k
7.40 8.62 8.54| 75   9  12   0   0   4|   0   137B|  52M   49M|   0     0 |  61k   40k
7.96 8.63 8.55| 75   9  12   0   0   4|   0   956B|  51M   51M|   0     0 |  59k   39k
7.46 8.44 8.49| 75   9  12   0   0   4|   0   273B|  51M   50M|   0     0 |  58k   38k
8.08 8.51 8.51| 75   9  12   0   0   4|   0   410B|  51M   51M|   0     0 |  59k   39k

load average is a little high but I'll worry about that later. The network traffic though is almost hitting the network potential.

I'm on Ubuntu 12.04,
8 Gigs Ram,
2.3 GHz processors (says EC2 :P)

Also, I've increased my file descriptors from 1024 to 10240

I thought go was designed for this kind of thing or am I expecting too much of go for this application?

Is there something trivial that I'm missing? Do I need to configure my system to maximizes go's potential?

EDIT

I guess my question wasn't clear enough. Sorry. I'm not asking for magic from go, I know the computers have limitations to what they can handle.
So I'll rephrase. Why is 1 instance with 1800 go routines != 9 instances with 200 threads each? Same amount of go routines significantly less performance for 1 instance compared to 9 instances.

答案1

得分: 2

请注意，goroutines也受限于您的本地机器，并且通道不具备本地网络功能，即您的特定情况可能不适用于Go的网络功能。

另外：您期望将（可能）每个传输都放入goroutine中会有什么效果？IO操作往往在数据与硬件接触时出现瓶颈，即数据传输到介质的物理过程。可以这样理解：无论有多少线程或（在这种情况下）Goroutines尝试写入网络卡，您仍然只有一张网络卡。很可能过多的并发写入调用只会减慢速度，因为涉及的开销会增加。

如果您认为这不是问题，或者想要优化代码的性能，Go具有很好的内置功能来实现这一点：分析Go程序（官方Go博客）。但实际的瓶颈可能在于您的Go程序之外和/或与操作系统的交互方式。

没有代码的情况下，解决您的实际问题只是在猜测。请发布一些代码，每个人都会尽力帮助您。

英文:

Please note, that goroutines are also limited to your local maschine and that channels are not natively network enabled, i.e. your particular case is probably not biting go's chocolate site.

Also: What did you expect from throwing (suposedly) every transfer into a goroutine? IO-Operations tend to have their bottleneck where the bits hit the metal, i.e. the physical transfer of the data to the medium. Think of it like that: No matter how many Threads or (Goroutines in this case) try to write to Networkcard, you still only have one Networkcard. Most likely hitting it with to many concurrent write calls will only slow things down, since the involved overhead increases

If you think this is not the problem or want to audit your code for optimized performance, go has neat builtin features to do so: profiling go programs (official go blog)
But still the actual bottleneck might well be outside your go program AND/OR in the way it interacts with the os.

Adressing your actual problem without code is pointless guessing. Post some and everyone will try their best to help you.

答案2

得分: 1

你可能需要发布你的源代码才能得到真正的输入，但为了确保，你已经增加了要使用的CPU数量吗？

import "runtime"

func main() {
    runtime.GOMAXPROCS(runtime.NumCPU())
}

英文:

You will probably have to post your source code to get any real input, but just to be sure, you have increased number of cpus to use?

import &quot;runtime&quot;

func main() {
    runtime.GOMAXPROCS(runtime.NumCPU())
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Go协程性能最大化

问题

答案1

答案2

How would you set and clear a single bit in Go?

如何在Go中正确处理带有转义字符的字符串？

在Go语言中，访问结构体的不同成员是否是线程安全的？

Golang xml.Unmarshal接口类型

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论