gocb:使用Golang批量插入数据到Couchbase时,整个数据没有被插入。

huangapple go评论95阅读模式
英文:

gocb: bulk insert into couchbase using golang- entire data is not being inserted

问题

我正在创建JSON数据(大约5000条记录)在我的SQL服务器实例中,并尝试使用golang中的批量插入操作将其插入到couchbase存储桶中。问题在于并没有推送整个数据,只插入了随机数量的记录(在2000到3000之间)。

代码如下:

package main

import (
    "database/sql"
    "log"
    "fmt"
    _ "github.com/denisenkom/go-mssqldb"
    "gopkg.in/couchbase/gocb.v1"
)
              

func main() {
    var (
        ID string
        JSONData string
    )
 
    var items []gocb.BulkOp      
    cluster, _ := gocb.Connect("couchbase://localhost")
    bucket, _ := cluster.OpenBucket("example", "")

    condb, _ := sql.Open("mssql", "server=.\\SQLEXPRESS;port=62587; user id=<id>;password=<pwd>;")

    // 从SQL Server中以JSON格式获取大约5000条记录
    rows, err = condb.Query("Select id, JSONData From User")
    if err != nil {
        log.Fatal(err)
        err = nil
    }

    for rows.Next() {
        _ = rows.Scan(&ID,&JSONData)
        items = append(items, &gocb.UpsertOp{Key: ID, Value: JSONData})
    }

    // 批量将JSON加载到Couchbase中
    err = bucket.Do(items)
    if err != nil {
        fmt.Println("执行批量插入时出错:", err)
    }

    _ = bucket.Close() 
}

请告诉我我在哪里出错了。

顺便说一下,SQL查询中的ID和JSONdata列包含有效的键和JSON字符串。如果对代码的改进有任何建议,我将不胜感激。

英文:

I am creating JSON Data (approx. 5000 records) in my SQL server instance and trying to Insert it into couchbase bucket using bulk insert operation in golang. The problem here is that entire data is not being pushed and a random number of records (between 2000 to 3000) are being insert only.

The code is:

package main

import (
    &quot;database/sql&quot;
	&quot;log&quot;
	&quot;fmt&quot;
	_ &quot;github.com/denisenkom/go-mssqldb&quot;
	&quot;gopkg.in/couchbase/gocb.v1&quot;
)
          

func main() {
    var (
        ID string
        JSONData string
    )
 
    var items []gocb.BulkOp      
    cluster, _ := gocb.Connect(&quot;couchbase://localhost&quot;)
    bucket, _ := cluster.OpenBucket(&quot;example&quot;, &quot;&quot;)

    condb, _ := sql.Open(&quot;mssql&quot;, &quot;server=.\\SQLEXPRESS;port=62587; user id=&lt;id&gt;;password=&lt;pwd&gt;;&quot;)

    // Get approx 5000 Records From SQL Server in JSON format
    rows, err = condb.Query(&quot;Select id, JSONData From User&quot;)
    if err != nil {
        log.Fatal(err)
        err = nil
    }

    for rows.Next() {
        _ = rows.Scan(&amp;ID,&amp;JSONData)
        items = append(items, &amp;gocb.UpsertOp{Key: ID, Value: JSONData})
    }

    //Bulk Load JSON into Couchbase
    err = bucket.Do(items)
    if err != nil {
	    fmt.Println(&quot;ERRROR PERFORMING BULK INSERT:&quot;, err)
    }

    _ = bucket.Close() 
}

Please tell me where I went wrong here.

FYI the columns ID and JSONdata in sql query contain valid key and JSON strings. Also, any improvement advice in the the way its coded will be appreciated.

答案1

得分: 1

我错过了对InsertOp类型的Err字段进行检查,当我这样做时,我发现当数据超过其容量时,items数组会溢出,并且在打印该字段时屏幕上会显示一个"queue overflowed"的消息。

for i := range items {
    fmt.Println(items[i].(*gocb.InsertOp).Err)
}

错误消息的截图在这里:
Err.png

有没有除了将数据拆分成多个批次并执行多个批量插入之外的解决方法?

英文:

I missed checking the Err field of InsertOp type and when I did that, I came to know that the items array overflows when the data exceeds it's capacity and a message 'queue overflowed' shows on the screen when you print that field

for i := range items {
	fmt.Println( items[i].(*gocb.InsertOp).Err)
}

Attatched screenshot of the error message is here:
Err.png

Is there any workaround for this limitation apart from splitting the data into a number of batches and performing multiple bulk inserts?

答案2

得分: 0

为什么不尝试使用多个goroutine和一个channel来同步它们?创建一个需要插入的项目的channel,然后启动16个或更多的goroutine从channel中读取,执行插入操作,然后继续进行。对于一个严格串行的插入器来说,最常见的瓶颈很可能是网络往返,如果你可以同时有多个goroutine执行插入操作,将大大提高性能。

附注:关于批量插入没有插入每个文档的问题,这是一个奇怪的问题,我会进一步调查。正如@ingenthr上面提到的,你是否可能在进行upsert操作并且对于相同的键有多个操作?

旧问题,在错误的答案部分:
你是否从批量插入中得到任何错误输出?

英文:

Why not try using a number of goroutines and a channel to synchronize them. Create a channel of items that need to be inserted, and then start 16 or more goroutines which read form the channel, perform the insert and then continue. The most common obvious bottleneck for a strictly serial inserter is going to be the network round-trip, if you can have many goroutines performing inserts at once, you will vastly improve the performance.

P.S. The issue with bulk insert not inserting every document is a strange one, I am going to take a look into this. As @ingenthr mentioned above though, is it possible that you are doing upsert's and have multiple operations for the same keys?

Old Question, In the Answers section in error:
Are you getting any error outputs from the bulk insert?

huangapple
  • 本文由 发表于 2017年7月13日 21:42:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/45082446.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定