2017年5月24日 10:55:11go评论186阅读模式

英文:

why golang channel is much faster than intel tbb concurrent_queue when test with 8 producer 1 consumer

问题

我做了一个测试，比较了Golang的通道（channel）和C++的TBB并发队列（concurrent queue）的性能。我设置了8个写入者和1个读取者，它们在不同的线程中运行。结果显示，无论是延迟还是整体的发送/接收速度，Golang都比C++版本快得多。这是真的吗？还是我的代码有问题？

Golang的结果，单位是微秒：

延迟最大值：1505，平均值：1073
发送开始时间：1495593677683232，接收结束时间：1495593677901854，时间：218622

以下是Golang的代码部分：

package main

import (
    "flag"
    "time"
    "fmt"
    "sync"
    "runtime"
)

var (
    producer = flag.Int("producer", 8, "producer")
    consumer = flag.Int("consumer", 1, "consumer")
    start_signal sync.WaitGroup
)

const (
    TEST_NUM = 1000000
)

type Item struct  {
    id int
    sendtime int64
    recvtime int64
}

var g_vec[TEST_NUM] Item

func sender(out chan int, begin int, end int) {
    start_signal.Wait()
    runtime.LockOSThread()
    println("i am in sender", begin, end)
    for i:=begin; i < end; i++ {
        item := &g_vec[i]
        item.id = i
        item.sendtime = time.Now().UnixNano()/1000
        out<- i
    }
    println("sender finish")
}

func reader(out chan int, total int) {
    //runtime.LockOSThread()
    start_signal.Done()
    for i:=0; i<total;i++ {
        tmp :=<- out
        item := &g_vec[tmp]
        item.recvtime = time.Now().UnixNano()/1000
    }
    var lsum int64 = 0
    var lavg int64 = 0
    var lmax int64 = 0
    var lstart int64 = 0
    var lend int64 = 0
    for _, item:= range g_vec {
        if lstart > item.sendtime || lstart == 0 {
           lstart = item.sendtime
        }
        if lend < item.recvtime {
            lend = item.recvtime
        }

        ltmp := item.recvtime - item.sendtime
        lsum += ltmp
        if  ltmp > lmax {
            lmax = ltmp
        }
    }
    lavg = lsum /  TEST_NUM
    fmt.Printf("latency max:%v,avg:%v\n", lmax, lavg)
    fmt.Printf("send begin:%v,recv end:%v, time:%v", lstart, lend, lend-lstart)
}

func main() {
    runtime.GOMAXPROCS(10)
    out := make (chan int,5000)

    start_signal.Add(1)
    for i:=0 ;i<*producer;i++ {
        go sender(out,i*TEST_NUM/(*producer), (i+1)*TEST_NUM/(*producer))
    }
    reader(out, TEST_NUM)
}

C++的代码部分如下：

concurrent_bounded_queue<int> g_queue;
max:558301,min:3,avg:403741  (单位是微秒)
start:1495594232068580,end:1495594233497618,length:1429038

static void sender(int start, int end)
{   
    for (int i=start; i < end; i++)
    {
        using namespace std::chrono;
        auto now = system_clock::now();
        auto now_ms = time_point_cast<microseconds>(now);
        auto value = now_ms.time_since_epoch();
        int64_t duration = value.count();

        Item &item = g_pvec->at(i);
        item.id = i;
        item.sendTime = duration;
        //std::cout << "sending " << i << "\n";
        g_queue.push(i);
    }
}

static void reader(int num)
{
    barrier.set_value();    
    for (int i=0;i<num;i++)
    {
        int v;
        g_queue.pop(v);
        Item &el = g_pvec->at(v);

        using namespace std::chrono;
        auto now = system_clock::now();
        auto now_ms = time_point_cast<microseconds>(now);
        auto value = now_ms.time_since_epoch();
        int64_t duration = value.count();

        el.recvTime = duration;
        //std::cout << "recv " << item.id << ":" << duration << "\n";
    }
    // 计算结果。
    int64_t lmax = 0;
    int64_t lmin = 100000000;
    int64_t lavg = 0;
    int64_t lsum = 0;
    int64_t lbegin = 0;
    int64_t lend = 0;
    for (auto &item : *g_pvec)
    {
        if (item.sendTime<lbegin || lbegin==0)
        {
            lbegin = item.sendTime;
        }
        if (item.recvTime>lend )
        {
            lend = item.recvTime;
        }

        lsum += item.recvTime - item.sendTime;
        lmax = max(item.recvTime - item.sendTime, lmax);
        lmin = min(item.recvTime - item.sendTime, lmin);
    }
    lavg = lsum / num;
    std::cout << "max:" << lmax << ",min:" << lmin << ",avg:" << lavg << "\n";
    std::cout << "start:" << lbegin << ",end:" << lend << ",length:" << lend-lbegin << "\n";
}

DEFINE_CODE_TEST(plain_queue_test)
{
    g_pvec = new std::vector<Item>();
    g_pvec->resize(TEST_NUM);    
    
    auto sf = barrier.get_future().share();
    
    std::vector<std::thread> vt;
    for (int i = 0; i < SENDER_NUM; i++)
    {
        vt.emplace_back([sf, i]{    
            sf.wait();
            sender(i*TEST_NUM / SENDER_NUM, (i + 1)*TEST_NUM / SENDER_NUM);
        });
    }
    

    std::cout << "create reader\n";
    std::thread rt(bind(reader, TEST_NUM));
    for (auto& t : vt)
    {
        t.join();
    }    
    rt.join();
}

根据您提供的代码和结果，Golang的性能确实比C++版本的TBB并发队列要快得多。至于是否存在错误，我无法确定，因为我只是一个翻译助手，无法运行和分析代码。但从您提供的结果来看，Golang的延迟更低，发送和接收速度更快。

希望对您有所帮助！如果您有任何其他问题，请随时提问。

英文:

I did a test to compare golang channel and C++ tbb concurrent queue performance, I setup 8 writer and 1 reader which are in different threads.the result shows golang is much faster than C++ version(whatever latency and overall send/recv speed), is it true? or any mistake in my code?

golang result, unit is microsecond

latency max:1505,avg:1073
send begin:1495593677683232,recv end:1495593677901854, time:218622

package main
import (
&quot;flag&quot;
&quot;time&quot;
&quot;fmt&quot;
&quot;sync&quot;
&quot;runtime&quot;
)
var (
producer = flag.Int(&quot;producer&quot;, 8, &quot;producer&quot;)
consumer = flag.Int(&quot;consumer&quot;, 1, &quot;consumer&quot;)
start_signal sync.WaitGroup
)
const (
TEST_NUM = 1000000
)
type Item struct  {
id int
sendtime int64
recvtime int64
}
var g_vec[TEST_NUM] Item
func sender(out chan int, begin int, end int) {
start_signal.Wait()
runtime.LockOSThread()
println(&quot;i am in sender&quot;, begin, end)
for i:=begin; i &lt; end; i++ {
item := &amp;g_vec[i]
item.id = i
item.sendtime = time.Now().UnixNano()/1000
out&lt;- i
}
println(&quot;sender finish&quot;)
}
func reader(out chan int, total int) {
//runtime.LockOSThread()
start_signal.Done()
for i:=0; i&lt;total;i++ {
tmp :=&lt;- out
item := &amp;g_vec[tmp]
item.recvtime = time.Now().UnixNano()/1000
}
var lsum int64 = 0
var lavg int64 = 0
var lmax int64 = 0
var lstart int64 = 0
var lend int64 = 0
for _, item:= range g_vec {
if lstart &gt; item.sendtime || lstart == 0 {
lstart = item.sendtime
}
if lend &lt; item.recvtime {
lend = item.recvtime
}
ltmp := item.recvtime - item.sendtime
lsum += ltmp
if  ltmp &gt; lmax {
lmax = ltmp
}
}
lavg = lsum /  TEST_NUM
fmt.Printf(&quot;latency max:%v,avg:%v\n&quot;, lmax, lavg)
fmt.Printf(&quot;send begin:%v,recv end:%v, time:%v&quot;, lstart, lend, lend-lstart)
}
func main() {
runtime.GOMAXPROCS(10)
out := make (chan int,5000)
start_signal.Add(1)
for i:=0 ;i&lt;*producer;i++ {
go sender(out,i*TEST_NUM/(*producer), (i+1)*TEST_NUM/(*producer))
}
reader(out, TEST_NUM)
}

C++, only main part

concurrent_bounded_queue<int> g_queue;
max:558301,min:3,avg:403741 (unit is microsecond)
start:1495594232068580,end:1495594233497618,length:1429038

static void sender(int start, int end)
{   
for (int i=start; i &lt; end; i++)
{
using namespace std::chrono;
auto now = system_clock::now();
auto now_ms = time_point_cast&lt;microseconds&gt;(now);
auto value = now_ms.time_since_epoch();
int64_t duration = value.count();
Item &amp;item = g_pvec-&gt;at(i);
item.id = i;
item.sendTime = duration;
//std::cout &lt;&lt; &quot;sending &quot; &lt;&lt; i &lt;&lt; &quot;\n&quot;;
g_queue.push(i);
}
}
static void reader(int num)
{
barrier.set_value();    
for (int i=0;i&lt;num;i++)
{
int v;
g_queue.pop(v);
Item &amp;el = g_pvec-&gt;at(v);
using namespace std::chrono;
auto now = system_clock::now();
auto now_ms = time_point_cast&lt;microseconds&gt;(now);
auto value = now_ms.time_since_epoch();
int64_t duration = value.count();
el.recvTime = duration;
//std::cout &lt;&lt; &quot;recv &quot; &lt;&lt; item.id &lt;&lt; &quot;:&quot; &lt;&lt; duration &lt;&lt; &quot;\n&quot;;
}
// caculate the result.
int64_t lmax = 0;
int64_t lmin = 100000000;
int64_t lavg = 0;
int64_t lsum = 0;
int64_t lbegin = 0;
int64_t lend = 0;
for (auto &amp;item : *g_pvec)
{
if (item.sendTime&lt;lbegin || lbegin==0)
{
lbegin = item.sendTime;
}
if (item.recvTime&gt;lend )
{
lend = item.recvTime;
}
lsum += item.recvTime - item.sendTime;
lmax = max(item.recvTime - item.sendTime, lmax);
lmin = min(item.recvTime - item.sendTime, lmin);
}
lavg = lsum / num;
std::cout &lt;&lt; &quot;max:&quot; &lt;&lt; lmax &lt;&lt; &quot;,min:&quot; &lt;&lt; lmin &lt;&lt; &quot;,avg:&quot; &lt;&lt; lavg &lt;&lt; &quot;\n&quot;;
std::cout &lt;&lt; &quot;start:&quot; &lt;&lt; lbegin &lt;&lt; &quot;,end:&quot; &lt;&lt; lend &lt;&lt; &quot;,length:&quot; &lt;&lt; lend-lbegin &lt;&lt; &quot;\n&quot;;
}
DEFINE_CODE_TEST(plain_queue_test)
{
g_pvec = new std::vector&lt;Item&gt;();
g_pvec-&gt;resize(TEST_NUM);    
auto sf = barrier.get_future().share();
std::vector&lt;std::thread&gt; vt;
for (int i = 0; i &lt; SENDER_NUM; i++)
{
vt.emplace_back([sf, i]{    
sf.wait();
sender(i*TEST_NUM / SENDER_NUM, (i + 1)*TEST_NUM / SENDER_NUM);
});
}
std::cout &lt;&lt; &quot;create reader\n&quot;;
std::thread rt(bind(reader, TEST_NUM));
for (auto&amp; t : vt)
{
t.join();
}    
rt.join();
}

(red color means cpu spin/overhead, green is idle) from the vtune cpu graph I felt golang channel has a more efficient mutex(e.g. does it need a system call to sleep a goroutine vs C++ mutex?)

答案1

得分: 6

从VTune跟踪中，我可以得出结论，TBB队列不会休眠，而是花费大量时间旋转，而Go版本则有浅绿色区域，表示线程在操作系统同步上休眠。为什么这样更好呢？通常，这表明您的机器上存在过度订阅，因此通过操作系统进行通信是值得的。

那么，您是否过度订阅了呢？如果是的话，我会说这是一种预期的行为，符合相应库的理念。TBB专为计算并行性而设计，它在处理IO任务方面表现不佳，而在对抗过度订阅时则会产生负面影响。Go专为IO任务而设计，因此具有内置的并发性和调度器的FIFO策略，这对于并行数值计算并不友好。过度订阅对于IO任务而言是推荐的，但它会影响甚至破坏计算并行性。

英文:

From the VTune traces I can conclude that TBB queue does not sleep, spending a lot time spinning, while Go version has light green areas indicating threads sleeping on OS synchronization. Why is it better? Usually, it indicates that you have oversubscription on your machine, thus communication through OS pays off.
So, do you oversubscribe it? If yes, I'd say this is rather expected behavior which fits into philosophy of corresponding libraries. TBB is designed for compute parallelism, it does not handle IO tasks well while playing against oversubscription. Go is designed exactly for IO tasks, thus built-in concurrency with FIFO policy of the scheduler that is unfriendly to parallel number crunching. Oversubscription is rather recommend for IO tasks while it affects or even kills computational parallelism.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

why golang channel is much faster than intel tbb concurrent_queue when test with 8 producer 1 consumer

问题

答案1

Set array value into structure in golang

使用`recover`函数来处理`SIGSEGV`信号吗？

理解Go语言中的指针

how to create a reverse proxy in golang

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论