英文:
Occasional PostgreSQL "Duplicate key value violates unique constraint" error from Go insert
问题
我有一个带有唯一约束的表:
CREATE UNIQUE INDEX "bd_hash_index" ON "public"."bodies" USING btree ("hash");
我还有一个使用Go编写的程序,它从通道中获取"body"值,通过哈希过滤掉重复项,并将非重复项插入数据库。代码如下:
import (
"crypto/md5"
"database/sql"
"encoding/hex"
"log"
"strings"
"time"
)
type Process struct {
DB *sql.DB
BodiesHash map[string]bool
Channel chan BodyIterface
Logger *log.Logger
}
func (pr *Process) Run() {
bodyInsert, err := pr.DB.Prepare("INSERT INTO bodies (hash, type, source, body, created_timestamp) VALUES ($1, $2, $3, $4, $5)")
if err != nil {
pr.Logger.Println(err)
return
}
defer bodyInsert.Close()
hash := md5.New()
for p := range pr.Channel {
nowUnix := time.Now().Unix()
bodyString := strings.Join([]string{
p.GetType(),
p.GetSource(),
p.GetBodyString(),
}, ":")
hash.Write([]byte(bodyString))
bodyHash := hex.EncodeToString(hash.Sum(nil))
hash.Reset()
if _, ok := pr.BodiesHash[bodyHash]; !ok {
pr.BodiesHash[bodyHash] = true
_, err = bodyInsert.Exec(
bodyHash,
p.GetType(),
p.GetSource(),
p.GetBodyString(),
nowUnix,
)
if err != nil {
pr.Logger.Println(err, bodyString, bodyHash)
}
}
}
}
但是我定期在日志文件中收到以下错误信息:
pq: duplicate key value violates unique constraint "bd_hash_index"
我无法想象为什么会出现这种情况,因为在插入之前,我已经检查了哈希的唯一性。
我确定在调用go processDebugBody.Run()
时,bodies表是空的。
该通道被创建为带有缓冲区的通道:
processDebugBody.Channel = make(chan BodyIterface, 1000)
以上是翻译好的内容,请确认是否满意。
英文:
I have a table with the unique constraint
CREATE UNIQUE INDEX "bd_hash_index" ON "public"."bodies" USING btree ("hash");
I also have a Go program that takes "body" values on a channel, filters out the duplicates by hashing, and inserts only the non-duplicates into the database.
Like this:
import (
"crypto/md5"
"database/sql"
"encoding/hex"
"log"
"strings"
"time"
)
type Process struct {
DB *sql.DB
BodiesHash map[string]bool
Channel chan BodyIterface
Logger *log.Logger
}
func (pr *Process) Run() {
bodyInsert, err := pr.DB.Prepare("INSERT INTO bodies (hash, type, source, body, created_timestamp) VALUES ($1, $2, $3, $4, $5)")
if err != nil {
pr.Logger.Println(err)
return
}
defer bodyInsert.Close()
hash := md5.New()
for p := range pr.Channel {
nowUnix := time.Now().Unix()
bodyString := strings.Join([]string{
p.GetType(),
p.GetSource(),
p.GetBodyString(),
}, ":")
hash.Write([]byte(bodyString))
bodyHash := hex.EncodeToString(hash.Sum(nil))
hash.Reset()
if _, ok := pr.BodiesHash[bodyHash]; !ok {
pr.BodiesHash[bodyHash] = true
_, err = bodyInsert.Exec(
bodyHash,
p.GetType(),
p.GetSource(),
p.GetBodyString(),
nowUnix,
)
if err != nil {
pr.Logger.Println(err, bodyString, bodyHash)
}
}
}
}
But periodically I get the error
> "pq: duplicate key value violates unique constraint "bd_hash_index""
in my log file. I can't image how it can be, because I check the hash for uniqueness before I do an insert.
I am sure that when I call go processDebugBody.Run()
the bodies table is empty.
The channel was created as a buffered channel with:
processDebugBody.Channel = make(chan BodyIterface, 1000)
答案1
得分: 1
当你使用sql.DB
执行一个事务之外的查询时,如果连接出现问题,它会自动进行重试。在当前的实现中,最多重试10次。例如,在sql.Exec
中可以注意到maxBadConnRetries
。
现在,只有当底层驱动程序返回driver.ErrBadConn
时,才会真正发生重试,并且规范中指出了以下内容:
> ErrBadConn应该由驱动程序返回,以向sql包发出信号,表示driver.Conn处于错误状态(例如服务器先前关闭了连接),并且sql包应该在新连接上重试。
>
> 为了防止重复操作,如果数据库服务器可能执行了操作,则不应返回ErrBadConn。
我认为驱动程序的实现在实施这个规则时有点粗心,但也许背后有一些逻辑。前几天我一直在研究lib/pq
的实现,注意到这种情况是可能发生的。
正如你在评论中指出的,你在看到重复操作之前遇到了一些SSL错误,所以这似乎是一个合理的猜测。
需要考虑的一件事是使用事务。如果在提交事务之前失去连接,你可以确保事务会被回滚。此外,事务的语句不会在出现错误连接时自动重新传输,所以这个问题可能会得到解决——不过你很可能会直接收到SSL错误,所以你需要自己进行重试。
我必须告诉你,我在使用Go 1.3时也遇到了使用Postgres的SSL重新协商错误,这就是为什么我暂时禁用了我的内部数据库的SSL(在连接字符串中使用sslmode=disable
)。我想知道1.4版本是否解决了这个问题,因为changelog中有一项是_The crypto/tls package now supports ALPN as defined in RFC 7301_(ALPN代表应用层协议协商扩展)。
英文:
When you execute a query outside of transaction with sql.DB
, it automatically retries when there's a problem with connection. In the current implementation, up to 10 times. For example, notice maxBadConnRetries
in sql.Exec
.
Now, it really happens only when underlying driver returns driver.ErrBadConn
and specification states the following:
> ErrBadConn should be returned by a driver to signal to the sql package that a driver.Conn is in a bad state (such as the server having earlier closed the connection) and the sql package should retry on a new connection.
>
> To prevent duplicate operations, ErrBadConn should NOT be returned if there's a possibility that the database server might have performed the operation.
I think driver implementations are a little bit careless in implementing this rule, but maybe there is some logic behind it. I've been studying implementation of lib/pq
the other day and noticed this scenario would be possible.
As you pointed out in the comments you have some SSL errors issued just before seeing duplicates, so this seems like a reasonable guess.
One thing to consider is to use transactions. If you lose the connection before committing the transaction, you can be sure it will be rolled back. Also the statements of the transactions are not retransmitted automatically on bad connections, so this problem might be solved – you will most probably will se SSL errors being propagated directly to you application though, so you'll need to retry on your own.
I must tell you I've been also seeing SSL renegotiation errors on postgres using Go 1.3 and that's why I've disabled SSL for my internal DB for time being (sslmode=disable
in the connection string). I was wondering whether version 1.4 has solved the issue, as one thing on changelog was The crypto/tls package now supports ALPN as defined in RFC 7301 (ALPN states for Application-Layer Protocol Negotiation Extension).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论