在一行上的唯一约束违规导致整个 pq.CopyIn 导入到 PostgreSQL 失败。

huangapple go评论103阅读模式
英文:

unique constraint violation on a row causes entire pq.CopyIn postgresql import to fail

问题

我正在尝试使用pq.CopyIn来进行批量导入,具体描述可以参考这里:

https://godoc.org/github.com/lib/pq

与我尝试过的其他方法相比,这种导入速度更快,但我发现如果只有一个记录存在唯一约束冲突,整个导入过程就会失败。

有没有办法在使用pq.CopyIn时设置ON CONFLICT DO NOTHING。

这是我的表结构副本:

CREATE TABLE test (
	id serial PRIMARY KEY,
	unique_token VARCHAR ( 10 ) UNIQUE NOT NULL,
	frequency INT DEFAULT 0
);

我尝试使用下面的@mkopriva答案,但是我得到了错误:pq: null value in column "id" violates not-null constraint

以下是代码示例:

tx, _ := db.Begin()

_, err = tx.Exec(`CREATE TEMP TABLE token_temp ON COMMIT DROP AS 
   SELECT id, unique_token FROM test WITH NO DATA`)
if err != nil {
	return err
}

stmt, err := tx.Prepare(pq.CopyIn("token_temp", "unique_token"))
if err != nil {
	fmt.Println("error here")
	return err
}

for _, token := range tokenList {
	_, err = stmt.Exec(token)

	if err != nil {
		return err
	}

}

_, err = stmt.Exec()
if err != nil {
	log.Fatal(err)
}

err = stmt.Close()
if err != nil {
	log.Fatal(err)
}

_, err = tx.Exec(`INSERT INTO test SELECT id, unique_token FROM 
  token_temp   ON CONFLICT(unique_token) DO UPDATE SET frequency= 
   test.frequency + 1 `)
if err != nil {
	fmt.Println("Error")
	return err
}

err = tx.Commit()
if err != nil {
	log.Fatal(err)
}
英文:

I'm trying to use pq.CopyIn to do bulk imports as described here:

https://godoc.org/github.com/lib/pq

The import is much faster than other methods I've tried but I am finding that a unique constraint violation in just one record will cause the entire import to fail.

Is there any way to set ON CONFLICT DO NOTHING using pq.CopyIn.

Here is a copy of my table structure

CREATE TABLE test (
	id serial PRIMARY KEY,
	unique_token VARCHAR ( 10 ) UNIQUE NOT NULL,
	frequency INT DEFAULT 0
);

I tried using @mkopriva answer below but I'm getting Error: pq: null
value in column "id" violates not-null constraint

Code sample below

tx, _ := db.Begin()

_, err = tx.Exec(`CREATE TEMP TABLE token_temp ON COMMIT DROP AS 
   SELECT id, unique_token FROM test WITH NO DATA`)
if err != nil {
	return err
}

stmt, err := tx.Prepare(pq.CopyIn("token_temp", "unique_token"))
if err != nil {
	fmt.Println("error here")
	return err
}

for _, token := range tokenList {
	_, err = stmt.Exec(token)

	if err != nil {
		return err
	}

}

_, err = stmt.Exec()
if err != nil {
	log.Fatal(err)
}

err = stmt.Close()
if err != nil {
	log.Fatal(err)
}

_, err = tx.Exec(`INSERT INTO test SELECT id, unique_token FROM 
  token_temp   ON CONFLICT(unique_token) DO UPDATE SET frequency= 
   test.frequency + 1 `)
if err != nil {
	fmt.Println("Error")
	return err
}

err = tx.Commit()
if err != nil {
	log.Fatal(err)
}

答案1

得分: 4

pq.CopyIn内部使用的是COPY FROM,它不支持ON CONFLICT子句。

但是,你可以创建一个没有约束的临时表,将数据复制到该临时表中,然后使用临时表作为要插入数据的源,对目标表进行INSERT操作,并使用你的ON CONFLICT子句。

举个例子,假设你有一个名为users的表,结构如下:

CREATE TABLE users (
    id serial PRIMARY KEY
    , name text
    , email text UNIQUE
);

假设你有一个用户切片,如下所示:

var users = []User{
	{Name: "John Doe", Email: "jdoe@example.com"},
	{Name: "Joe Blow", Email: "jblow@example.com"},
	{Name: "Jane Doe", Email: "jdoe@example.com"}, // 重复的电子邮件!
	{Name: "Foo Bar", Email: "fbar@example.com"},
}

你可以执行以下操作:

_, err = txn.Exec(`
CREATE TEMP TABLE users_temp
ON COMMIT DROP
AS SELECT * FROM users
WITH NO DATA`)
if err != nil {
	panic(err)
}

stmt, err := txn.Prepare(pq.CopyIn("users_temp", "name", "email"))
if err != nil {
	panic(err)
}

for _, u := range users {
	if _, err := stmt.Exec(u.Name, u.Email); err != nil {
		panic(err)
	}
}
if _, err := stmt.Exec(); err != nil {
	panic(err)
}
if err := stmt.Close(); err != nil {
	panic(err)
}

_, err = txn.Exec(`
INSERT INTO users (name, email)
SELECT name, email FROM users_temp
ON CONFLICT DO NOTHING`)
if err != nil {
	panic(err)
}

if err := txn.Commit(); err != nil {
	panic(err)
}

运行上述代码后,你可以执行SELECT * FROM users;,将得到以下结果:

 id |   name   |       email
----+----------+-------------------
  1 | John Doe | jdoe@example.com
  2 | Joe Blow | jblow@example.com
  4 | Foo Bar  | fbar@example.com
(3 rows)

对于你的具体示例和要求,你可以在INSERT ... SELECT ...查询中执行以下操作:

_, err = txn.Exec(`
INSERT INTO test (unique_token, frequency)
SELECT unique_token, COUNT(*) FROM token_temp
GROUP BY unique_token`)
if err != nil {
	panic(err)
}
英文:

pq.CopyIn internally uses COPY FROM which has no support for the ON CONFLICT clause.

What you can do, however, is to create a temporary table that has no constraints, copy the data into that temporary table, and then do an INSERT into the target table, with your ON CONFLICT clause, using the temporary table as the source of the data to be inserted.

An example should make this more clear, say you have a users table that looks like this:

CREATE TABLE users (
    id serial PRIMARY KEY
    , name text
    , email text UNIQUE
);

And say you have a slice of users like this:

var users = []User{
	{Name: "John Doe", Email: "jdoe@example.com"},
	{Name: "Joe Blow", Email: "jblow@example.com"},
	{Name: "Jane Doe", Email: "jdoe@example.com"}, // duplicate email!
	{Name: "Foo Bar", Email: "fbar@example.com"},
}

With that you can do the following:

_, err = txn.Exec(`
CREATE TEMP TABLE users_temp
ON COMMIT DROP
AS SELECT * FROM users
WITH NO DATA`)
if err != nil {
	panic(err)
}

stmt, err := txn.Prepare(pq.CopyIn("users_temp", "name", "email"))
if err != nil {
	panic(err)
}

for _, u := range users {
	if _, err := stmt.Exec(u.Name, u.Email); err != nil {
		panic(err)
	}
}
if _, err := stmt.Exec(); err != nil {
	panic(err)
}
if err := stmt.Close(); err != nil {
	panic(err)
}

_, err = txn.Exec(`
INSERT INTO users (name, email)
SELECT name, email FROM users_temp
ON CONFLICT DO NOTHING`)
if err != nil {
	panic(err)
}

if err := txn.Commit(); err != nil {
	panic(err)
}

After you run the above you can do SELECT * FROM users; and you'll get this:

 id |   name   |       email
----+----------+-------------------
  1 | John Doe | jdoe@example.com
  2 | Joe Blow | jblow@example.com
  4 | Foo Bar  | fbar@example.com
(3 rows)

For you specific example and requirement you can do something like this in the INSERT ... SELECT ... query:

_, err = txn.Exec(`
INSERT INTO test (unique_token, frequency)
SELECT unique_token, COUNT(*) FROM token_temp
GROUP BY unique_token`)
if err != nil {
	panic(err)
}

huangapple
  • 本文由 发表于 2021年8月5日 17:03:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/68663587.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定