英文:
gocb: bulk insert into couchbase using golang- entire data is not being inserted
问题
我正在创建JSON数据(大约5000条记录)在我的SQL服务器实例中,并尝试使用golang中的批量插入操作将其插入到couchbase存储桶中。问题在于并没有推送整个数据,只插入了随机数量的记录(在2000到3000之间)。
代码如下:
package main
import (
"database/sql"
"log"
"fmt"
_ "github.com/denisenkom/go-mssqldb"
"gopkg.in/couchbase/gocb.v1"
)
func main() {
var (
ID string
JSONData string
)
var items []gocb.BulkOp
cluster, _ := gocb.Connect("couchbase://localhost")
bucket, _ := cluster.OpenBucket("example", "")
condb, _ := sql.Open("mssql", "server=.\\SQLEXPRESS;port=62587; user id=<id>;password=<pwd>;")
// 从SQL Server中以JSON格式获取大约5000条记录
rows, err = condb.Query("Select id, JSONData From User")
if err != nil {
log.Fatal(err)
err = nil
}
for rows.Next() {
_ = rows.Scan(&ID,&JSONData)
items = append(items, &gocb.UpsertOp{Key: ID, Value: JSONData})
}
// 批量将JSON加载到Couchbase中
err = bucket.Do(items)
if err != nil {
fmt.Println("执行批量插入时出错:", err)
}
_ = bucket.Close()
}
请告诉我我在哪里出错了。
顺便说一下,SQL查询中的ID和JSONdata列包含有效的键和JSON字符串。如果对代码的改进有任何建议,我将不胜感激。
英文:
I am creating JSON Data (approx. 5000 records) in my SQL server instance and trying to Insert it into couchbase bucket using bulk insert operation in golang. The problem here is that entire data is not being pushed and a random number of records (between 2000 to 3000) are being insert only.
The code is:
package main
import (
"database/sql"
"log"
"fmt"
_ "github.com/denisenkom/go-mssqldb"
"gopkg.in/couchbase/gocb.v1"
)
func main() {
var (
ID string
JSONData string
)
var items []gocb.BulkOp
cluster, _ := gocb.Connect("couchbase://localhost")
bucket, _ := cluster.OpenBucket("example", "")
condb, _ := sql.Open("mssql", "server=.\\SQLEXPRESS;port=62587; user id=<id>;password=<pwd>;")
// Get approx 5000 Records From SQL Server in JSON format
rows, err = condb.Query("Select id, JSONData From User")
if err != nil {
log.Fatal(err)
err = nil
}
for rows.Next() {
_ = rows.Scan(&ID,&JSONData)
items = append(items, &gocb.UpsertOp{Key: ID, Value: JSONData})
}
//Bulk Load JSON into Couchbase
err = bucket.Do(items)
if err != nil {
fmt.Println("ERRROR PERFORMING BULK INSERT:", err)
}
_ = bucket.Close()
}
Please tell me where I went wrong here.
FYI the columns ID and JSONdata in sql query contain valid key and JSON strings. Also, any improvement advice in the the way its coded will be appreciated.
答案1
得分: 1
我错过了对InsertOp类型的Err字段进行检查,当我这样做时,我发现当数据超过其容量时,items数组会溢出,并且在打印该字段时屏幕上会显示一个"queue overflowed"的消息。
for i := range items {
fmt.Println(items[i].(*gocb.InsertOp).Err)
}
错误消息的截图在这里:
Err.png
有没有除了将数据拆分成多个批次并执行多个批量插入之外的解决方法?
英文:
I missed checking the Err field of InsertOp type and when I did that, I came to know that the items array overflows when the data exceeds it's capacity and a message 'queue overflowed' shows on the screen when you print that field
for i := range items {
fmt.Println( items[i].(*gocb.InsertOp).Err)
}
Attatched screenshot of the error message is here:
Err.png
Is there any workaround for this limitation apart from splitting the data into a number of batches and performing multiple bulk inserts?
答案2
得分: 0
为什么不尝试使用多个goroutine和一个channel来同步它们?创建一个需要插入的项目的channel,然后启动16个或更多的goroutine从channel中读取,执行插入操作,然后继续进行。对于一个严格串行的插入器来说,最常见的瓶颈很可能是网络往返,如果你可以同时有多个goroutine执行插入操作,将大大提高性能。
附注:关于批量插入没有插入每个文档的问题,这是一个奇怪的问题,我会进一步调查。正如@ingenthr上面提到的,你是否可能在进行upsert操作并且对于相同的键有多个操作?
旧问题,在错误的答案部分:
你是否从批量插入中得到任何错误输出?
英文:
Why not try using a number of goroutines and a channel to synchronize them. Create a channel of items that need to be inserted, and then start 16 or more goroutines which read form the channel, perform the insert and then continue. The most common obvious bottleneck for a strictly serial inserter is going to be the network round-trip, if you can have many goroutines performing inserts at once, you will vastly improve the performance.
P.S. The issue with bulk insert not inserting every document is a strange one, I am going to take a look into this. As @ingenthr mentioned above though, is it possible that you are doing upsert's and have multiple operations for the same keys?
Old Question, In the Answers section in error:
Are you getting any error outputs from the bulk insert?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论