英文:
unique constraint violation on a row causes entire pq.CopyIn postgresql import to fail
问题
我正在尝试使用pq.CopyIn来进行批量导入,具体描述可以参考这里:
https://godoc.org/github.com/lib/pq
与我尝试过的其他方法相比,这种导入速度更快,但我发现如果只有一个记录存在唯一约束冲突,整个导入过程就会失败。
有没有办法在使用pq.CopyIn时设置ON CONFLICT DO NOTHING。
这是我的表结构副本:
CREATE TABLE test (
id serial PRIMARY KEY,
unique_token VARCHAR ( 10 ) UNIQUE NOT NULL,
frequency INT DEFAULT 0
);
我尝试使用下面的@mkopriva答案,但是我得到了错误:pq: null value in column "id" violates not-null constraint
以下是代码示例:
tx, _ := db.Begin()
_, err = tx.Exec(`CREATE TEMP TABLE token_temp ON COMMIT DROP AS
SELECT id, unique_token FROM test WITH NO DATA`)
if err != nil {
return err
}
stmt, err := tx.Prepare(pq.CopyIn("token_temp", "unique_token"))
if err != nil {
fmt.Println("error here")
return err
}
for _, token := range tokenList {
_, err = stmt.Exec(token)
if err != nil {
return err
}
}
_, err = stmt.Exec()
if err != nil {
log.Fatal(err)
}
err = stmt.Close()
if err != nil {
log.Fatal(err)
}
_, err = tx.Exec(`INSERT INTO test SELECT id, unique_token FROM
token_temp ON CONFLICT(unique_token) DO UPDATE SET frequency=
test.frequency + 1 `)
if err != nil {
fmt.Println("Error")
return err
}
err = tx.Commit()
if err != nil {
log.Fatal(err)
}
英文:
I'm trying to use pq.CopyIn to do bulk imports as described here:
https://godoc.org/github.com/lib/pq
The import is much faster than other methods I've tried but I am finding that a unique constraint violation in just one record will cause the entire import to fail.
Is there any way to set ON CONFLICT DO NOTHING using pq.CopyIn.
Here is a copy of my table structure
CREATE TABLE test (
id serial PRIMARY KEY,
unique_token VARCHAR ( 10 ) UNIQUE NOT NULL,
frequency INT DEFAULT 0
);
I tried using @mkopriva answer below but I'm getting Error: pq: null
value in column "id" violates not-null constraint
Code sample below
tx, _ := db.Begin()
_, err = tx.Exec(`CREATE TEMP TABLE token_temp ON COMMIT DROP AS
SELECT id, unique_token FROM test WITH NO DATA`)
if err != nil {
return err
}
stmt, err := tx.Prepare(pq.CopyIn("token_temp", "unique_token"))
if err != nil {
fmt.Println("error here")
return err
}
for _, token := range tokenList {
_, err = stmt.Exec(token)
if err != nil {
return err
}
}
_, err = stmt.Exec()
if err != nil {
log.Fatal(err)
}
err = stmt.Close()
if err != nil {
log.Fatal(err)
}
_, err = tx.Exec(`INSERT INTO test SELECT id, unique_token FROM
token_temp ON CONFLICT(unique_token) DO UPDATE SET frequency=
test.frequency + 1 `)
if err != nil {
fmt.Println("Error")
return err
}
err = tx.Commit()
if err != nil {
log.Fatal(err)
}
答案1
得分: 4
pq.CopyIn
内部使用的是COPY FROM
,它不支持ON CONFLICT
子句。
但是,你可以创建一个没有约束的临时表,将数据复制到该临时表中,然后使用临时表作为要插入数据的源,对目标表进行INSERT
操作,并使用你的ON CONFLICT
子句。
举个例子,假设你有一个名为users
的表,结构如下:
CREATE TABLE users (
id serial PRIMARY KEY
, name text
, email text UNIQUE
);
假设你有一个用户切片,如下所示:
var users = []User{
{Name: "John Doe", Email: "jdoe@example.com"},
{Name: "Joe Blow", Email: "jblow@example.com"},
{Name: "Jane Doe", Email: "jdoe@example.com"}, // 重复的电子邮件!
{Name: "Foo Bar", Email: "fbar@example.com"},
}
你可以执行以下操作:
_, err = txn.Exec(`
CREATE TEMP TABLE users_temp
ON COMMIT DROP
AS SELECT * FROM users
WITH NO DATA`)
if err != nil {
panic(err)
}
stmt, err := txn.Prepare(pq.CopyIn("users_temp", "name", "email"))
if err != nil {
panic(err)
}
for _, u := range users {
if _, err := stmt.Exec(u.Name, u.Email); err != nil {
panic(err)
}
}
if _, err := stmt.Exec(); err != nil {
panic(err)
}
if err := stmt.Close(); err != nil {
panic(err)
}
_, err = txn.Exec(`
INSERT INTO users (name, email)
SELECT name, email FROM users_temp
ON CONFLICT DO NOTHING`)
if err != nil {
panic(err)
}
if err := txn.Commit(); err != nil {
panic(err)
}
运行上述代码后,你可以执行SELECT * FROM users;
,将得到以下结果:
id | name | email
----+----------+-------------------
1 | John Doe | jdoe@example.com
2 | Joe Blow | jblow@example.com
4 | Foo Bar | fbar@example.com
(3 rows)
对于你的具体示例和要求,你可以在INSERT ... SELECT ...
查询中执行以下操作:
_, err = txn.Exec(`
INSERT INTO test (unique_token, frequency)
SELECT unique_token, COUNT(*) FROM token_temp
GROUP BY unique_token`)
if err != nil {
panic(err)
}
英文:
pq.CopyIn
internally uses COPY FROM
which has no support for the ON CONFLICT
clause.
What you can do, however, is to create a temporary table that has no constraints, copy the data into that temporary table, and then do an INSERT
into the target table, with your ON CONFLICT
clause, using the temporary table as the source of the data to be inserted.
An example should make this more clear, say you have a users
table that looks like this:
CREATE TABLE users (
id serial PRIMARY KEY
, name text
, email text UNIQUE
);
And say you have a slice of users like this:
var users = []User{
{Name: "John Doe", Email: "jdoe@example.com"},
{Name: "Joe Blow", Email: "jblow@example.com"},
{Name: "Jane Doe", Email: "jdoe@example.com"}, // duplicate email!
{Name: "Foo Bar", Email: "fbar@example.com"},
}
With that you can do the following:
_, err = txn.Exec(`
CREATE TEMP TABLE users_temp
ON COMMIT DROP
AS SELECT * FROM users
WITH NO DATA`)
if err != nil {
panic(err)
}
stmt, err := txn.Prepare(pq.CopyIn("users_temp", "name", "email"))
if err != nil {
panic(err)
}
for _, u := range users {
if _, err := stmt.Exec(u.Name, u.Email); err != nil {
panic(err)
}
}
if _, err := stmt.Exec(); err != nil {
panic(err)
}
if err := stmt.Close(); err != nil {
panic(err)
}
_, err = txn.Exec(`
INSERT INTO users (name, email)
SELECT name, email FROM users_temp
ON CONFLICT DO NOTHING`)
if err != nil {
panic(err)
}
if err := txn.Commit(); err != nil {
panic(err)
}
After you run the above you can do SELECT * FROM users;
and you'll get this:
id | name | email
----+----------+-------------------
1 | John Doe | jdoe@example.com
2 | Joe Blow | jblow@example.com
4 | Foo Bar | fbar@example.com
(3 rows)
For you specific example and requirement you can do something like this in the INSERT ... SELECT ...
query:
_, err = txn.Exec(`
INSERT INTO test (unique_token, frequency)
SELECT unique_token, COUNT(*) FROM token_temp
GROUP BY unique_token`)
if err != nil {
panic(err)
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论