英文:
Postgresql - Removing duplicate rows per each user and customer
问题
在Postgres中,我有两个表:blog和blog_history。blog_history用于跟踪用户阅读的博客。
blog表:
CREATE TABLE blog (
id serial PRIMARY KEY,
title text,
description text,
body text,
created_at timestamp without time zone,
);
blog_history表:
CREATE TABLE blog_history (
customer_username text NOT NULL,
created_by text NOT NULL,
created_at timestamp without time zone,
post_id integer,
CONSTRAINT fk_post_id
FOREIGN KEY(post_id)
REFERENCES blog(id)
);
在blog_history表中有一些重复的行,具有相同的post_id,这是不必要的。
我想通过保留每个created_by和每个customer_username的一个不同行来删除所有重复行。
示例:
SELECT customer_username, created_by, post_id from blog_history;
customer_username | created_by | post_id
-------------------+-------------------+---------
公司A | bob@example.com | 1
公司A | bob@example.com | 3
公司A | bob@example.com | 2
公司A | bob@example.com | 2
公司A | bob@example.com | 2
公司A | bob@example.com | 3
公司B | bob@example.com | 3
公司B | bob@example.com | 3
公司A | tam@example.com | 1
公司A | tam@example.com | 3
公司A | tam@example.com | 3
公司A | tam@example.com | 2
删除重复行后,结果应该如下:
customer_username | created_by | post_id
-------------------+-------------------+---------
公司A | bob@example.com | 1
公司A | bob@example.com | 3
公司A | bob@example.com | 2
公司B | bob@example.com | 3
公司A | tam@example.com | 1
公司A | tam@example.com | 3
公司A | tam@example.com | 2
所以,我想保留唯一的post_id,并删除所有具有相同post_id的重复行,以保持相同customer_username和相同created_by。
英文:
In postgres, I have two tables: blog and blog_history. blog_history is to keep track of users and which blogs they read.
Tables structures are as follows:
blog table:
CREATE TABLE blog (
id serial PRIMARY KEY,
title text,
description text,
body text,
created_at timestamp without time zone,
);
blog_history table:
CREATE TABLE blog_history (
customer_username text NOT NULL,
created_by text NOT NULL,
created_at timestamp without time zone,
post_id integer,
CONSTRAINT fk_post_id
FOREIGN KEY(post_id)
REFERENCES blog(id)
);
I have some duplicate rows in blog_history table that has the same post_id which is unnecessary.
I want to remove all duplicate rows by keeping one distinct for each created_by and each customer_username.
Example:
SELECT customer_username, created_by, post_id from blog_history;
customer_username | created_by | post_id
-------------------+-------------------+---------
companyA | bob@example.com | 1
companyA | bob@example.com | 3
companyA | bob@example.com | 2
companyA | bob@example.com | 2
companyA | bob@example.com | 2
companyA | bob@example.com | 3
companyB | bob@example.com | 3
companyB | bob@example.com | 3
companyA | tam@example.com | 1
companyA | tam@example.com | 3
companyA | tam@example.com | 3
companyA | tam@example.com | 2
After deleting duplicates, result should be like this:
customer_username | created_by | post_id
-------------------+-------------------+---------
companyA | bob@example.com | 1
companyA | bob@example.com | 3
companyA | bob@example.com | 2
companyB | bob@example.com | 3
companyA | tam@example.com | 1
companyA | tam@example.com | 3
companyA | tam@example.com | 2
So, I want to leave only one distinct post_id and delete all duplicate ones with the same post_id for same customer_username and same created_by.
答案1
得分: 1
你需要一个列来标识每一行。向表中添加主键:
alter table blog_history
add id int generated always as identity primary key;
现在您可以轻松地在重复行的组中识别具有最低 id
的行。
delete from blog_history
where id not in (
select distinct on (customer_username, created_by, post_id) id
from blog_history
order by customer_username, created_by, post_id, id
);
在 db<>fiddle 中进行测试。
英文:
You need a column that would identify each row. Add the primary key to the table:
alter table blog_history
add id int generated always as identity primary key;
Now you can easily identify rows with the lowest id
in groups of duplicated rows.
delete from blog_history
where id not in (
select distinct on (customer_username, created_by, post_id) id
from blog_history
order by customer_username, created_by, post_id, id
);
Test it in db<>fiddle.
答案2
得分: 1
以下是您提供的SQL查询的翻译部分:
删除
从 t
其中 ctid 在
(
选择 ctid
从
(选择 *, ctid, row_number() over(partition by customer_username,created_by,post_id order by post_id) as rn
从 t
) t2
其中 rn > 1
);
选择 *
从 t
表格数据和"Fiddle"链接保持不变。
英文:
delete
from t
where ctid in
(
select ctid
from
(select *, ctid, row_number() over(partition by customer_username,created_by,post_id order by post_id) as rn
from t
) t2
where rn > 1
);
select *
from t
customer_username | created_by | post_id |
---|---|---|
companyA | bob@example.com | 1 |
companyA | bob@example.com | 3 |
companyA | bob@example.com | 2 |
companyB | bob@example.com | 3 |
companyA | tam@example.com | 1 |
companyA | tam@example.com | 3 |
companyA | tam@example.com | 2 |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论