PostgreSQL – 每个用户和客户去除重复行

huangapple go评论62阅读模式
英文:

Postgresql - Removing duplicate rows per each user and customer

问题

在Postgres中,我有两个表:blog和blog_history。blog_history用于跟踪用户阅读的博客。

blog表:

CREATE TABLE blog (
    id serial PRIMARY KEY,
    title text,
    description text,
    body text,
    created_at timestamp without time zone,
);

blog_history表:

CREATE TABLE blog_history (
    customer_username text NOT NULL,
    created_by text NOT NULL,
    created_at timestamp without time zone,
    post_id integer,
    CONSTRAINT fk_post_id
        FOREIGN KEY(post_id) 
        REFERENCES blog(id)
);

在blog_history表中有一些重复的行,具有相同的post_id,这是不必要的。
我想通过保留每个created_by和每个customer_username的一个不同行来删除所有重复行。
示例:

SELECT customer_username, created_by, post_id from blog_history;

 customer_username |    created_by     | post_id 
-------------------+-------------------+---------
 公司A                | bob@example.com |       1
 公司A                | bob@example.com |       3
 公司A                | bob@example.com |       2
 公司A                | bob@example.com |       2
 公司A                | bob@example.com |       2
 公司A                | bob@example.com |       3
 公司B                | bob@example.com |       3
 公司B                | bob@example.com |       3
 公司A                | tam@example.com |       1
 公司A                | tam@example.com |       3
 公司A                | tam@example.com |       3
 公司A                | tam@example.com |       2

删除重复行后,结果应该如下:

 customer_username |    created_by     | post_id 
-------------------+-------------------+---------
 公司A                | bob@example.com |       1
 公司A                | bob@example.com |       3
 公司A                | bob@example.com |       2
 公司B                | bob@example.com |       3
 公司A                | tam@example.com |       1
 公司A                | tam@example.com |       3
 公司A                | tam@example.com |       2

所以,我想保留唯一的post_id,并删除所有具有相同post_id的重复行,以保持相同customer_username和相同created_by。

英文:

In postgres, I have two tables: blog and blog_history. blog_history is to keep track of users and which blogs they read.
Tables structures are as follows:

blog table:

CREATE TABLE blog (
    id serial PRIMARY KEY,
    title text,
    description text,
    body text,
    created_at timestamp without time zone,
);

blog_history table:

CREATE TABLE blog_history (
    customer_username text NOT NULL,
    created_by text NOT NULL,
    created_at timestamp without time zone,
    post_id integer,
    CONSTRAINT fk_post_id
        FOREIGN KEY(post_id) 
        REFERENCES blog(id)
);

I have some duplicate rows in blog_history table that has the same post_id which is unnecessary.
I want to remove all duplicate rows by keeping one distinct for each created_by and each customer_username.
Example:

SELECT customer_username, created_by, post_id from blog_history;

 customer_username |    created_by     | post_id 
-------------------+-------------------+---------
 companyA             | bob@example.com |       1
 companyA             | bob@example.com |       3
 companyA             | bob@example.com |       2
 companyA             | bob@example.com |       2
 companyA             | bob@example.com |       2
 companyA             | bob@example.com |       3
 companyB             | bob@example.com |       3
 companyB             | bob@example.com |       3
 companyA             | tam@example.com |       1
 companyA             | tam@example.com |       3
 companyA             | tam@example.com |       3
 companyA             | tam@example.com |       2

After deleting duplicates, result should be like this:

 customer_username |    created_by     | post_id 
-------------------+-------------------+---------
 companyA             | bob@example.com |       1
 companyA             | bob@example.com |       3
 companyA             | bob@example.com |       2
 companyB             | bob@example.com |       3
 companyA             | tam@example.com |       1
 companyA             | tam@example.com |       3
 companyA             | tam@example.com |       2

So, I want to leave only one distinct post_id and delete all duplicate ones with the same post_id for same customer_username and same created_by.

答案1

得分: 1

你需要一个列来标识每一行。向表中添加主键:

alter table blog_history
	add id int generated always as identity primary key;

现在您可以轻松地在重复行的组中识别具有最低 id 的行。

delete from blog_history
where id not in (
	select distinct on (customer_username, created_by, post_id) id
	from blog_history
	order by customer_username, created_by, post_id, id
	);

db<>fiddle 中进行测试。

英文:

You need a column that would identify each row. Add the primary key to the table:

alter table blog_history
	add id int generated always as identity primary key;

Now you can easily identify rows with the lowest id in groups of duplicated rows.

delete from blog_history
where id not in (
	select distinct on (customer_username, created_by, post_id) id
	from blog_history
	order by customer_username, created_by, post_id, id
	);

Test it in db<>fiddle.

答案2

得分: 1

以下是您提供的SQL查询的翻译部分:

删除
 t
其中 ctid 
(
选择 ctid

(选择 *, ctid, row_number() over(partition by customer_username,created_by,post_id order by post_id) as rn
   t
) t2
其中 rn > 1
);

选择 *
   t

表格数据和"Fiddle"链接保持不变。

英文:
delete 
from t
where ctid in
(
select ctid
from
(select *, ctid, row_number() over(partition by customer_username,created_by,post_id order by post_id) as rn
from   t
) t2
where rn &gt; 1
);

select *
from   t
customer_username created_by post_id
companyA bob@example.com 1
companyA bob@example.com 3
companyA bob@example.com 2
companyB bob@example.com 3
companyA tam@example.com 1
companyA tam@example.com 3
companyA tam@example.com 2

Fiddle

huangapple
  • 本文由 发表于 2023年4月19日 23:46:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76056482.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定