使用Go语言向BigQuery插入数据时,更新/覆盖操作会替代插入操作。

huangapple go评论94阅读模式
英文:

Inserting into bigquery using go updates / overwrites instead of inserts

问题

我有一个用Go编写的AWS Lambda函数,应该插入到BigQuery中。它引用了cloud.google.com/go/bigquery包。

client, err := bigquery.NewClient(ctx, projectID, gcpOption)
if err != nil {
    println(fmt.Sprintf("error creating new client, %v", err))
    return fmt.Errorf("bigquery.NewClient: %v", err)
}
defer client.Close()

inserter := client.Dataset(datasetID).Table(tableID).Inserter()
if err := inserter.Put(ctx, items); err != nil {
    println(fmt.Sprintf("error inserting, %v", err))
    if multiError, ok := err.(bigquery.PutMultiError); ok {
        for _, err1 := range multiError {
            for _, err2 := range err1.Errors {
                fmt.Println(err2)
            }
        }
    } else {
        fmt.Println(err)
    }
    return err
} else {
    println("Inserted record")
}

运行时,会插入一条记录,但再次运行会导致先前插入的行被更新。这不是我期望的行为。我对Golang和GCP相对陌生,所以也许我的期望是错误的。

BigQuery中的表没有分区。Items是一个结构体数组。

英文:

I have an aws lambda written in Go that should insert into bigquery.
It reference the cloud.google.com/go/bigquery package.

client, err := bigquery.NewClient(ctx, projectID, gcpOption)
if err != nil {
	println(fmt.Sprintf("error creating new client, %v", err))
	return fmt.Errorf("bigquery.NewClient: %v", err)
}
defer client.Close()

inserter := client.Dataset(datasetID).Table(tableID).Inserter()
if err := inserter.Put(ctx, items); err != nil {
	println(fmt.Sprintf("error inserting, %v", err))
	if multiError, ok := err.(bigquery.PutMultiError); ok {
		for _, err1 := range multiError {
			for _, err2 := range err1.Errors {
				fmt.Println(err2)
			}
		}
	} else {
		fmt.Println(err)
	}
	return err
} else {
	println("Inserted record")
}

When run, a record will be inserted, but running again will result in the previously inserted row being updated. This is not the behaviour I was expecting.
I am relatively new to Golang and GCP, so perhaps I have the wrong expectations?

The table in big query is not partitioned.
Items is an array of structs.

答案1

得分: 1

Inserter可以用于实现至少一次的数据插入语义。插入机制无法实现upsert行为,这似乎是你所描述的。

我不清楚你是如何验证这种行为的,但我建议你再仔细检查一下。

关于tabledata.insertAll流式API的更多信息,该API是go语言Inserter的基础,可以在这里找到:https://cloud.google.com/bigquery/streaming-data-into-bigquery

英文:

The Inserter can be used to achieve at-least-once data insertion semantics. The insert mechanism is not capable of upsert behavior, which is what you appear to be describing.

It's unclear to me how you're validating this behavior, but I'd take another look at that as a starting point.

More information about the tabledata.insertAll streaming API which underlies the go Inserter can be found here: https://cloud.google.com/bigquery/streaming-data-into-bigquery

huangapple
  • 本文由 发表于 2021年7月21日 14:09:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/68464728.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定