PostgreSQL – 每个用户和客户去除重复行

huangapple go评论93阅读模式
英文:

Postgresql - Removing duplicate rows per each user and customer

问题

在Postgres中,我有两个表:blog和blog_history。blog_history用于跟踪用户阅读的博客。

blog表:

  1. CREATE TABLE blog (
  2. id serial PRIMARY KEY,
  3. title text,
  4. description text,
  5. body text,
  6. created_at timestamp without time zone,
  7. );

blog_history表:

  1. CREATE TABLE blog_history (
  2. customer_username text NOT NULL,
  3. created_by text NOT NULL,
  4. created_at timestamp without time zone,
  5. post_id integer,
  6. CONSTRAINT fk_post_id
  7. FOREIGN KEY(post_id)
  8. REFERENCES blog(id)
  9. );

在blog_history表中有一些重复的行,具有相同的post_id,这是不必要的。
我想通过保留每个created_by和每个customer_username的一个不同行来删除所有重复行。
示例:

  1. SELECT customer_username, created_by, post_id from blog_history;
  2. customer_username | created_by | post_id
  3. -------------------+-------------------+---------
  4. 公司A | bob@example.com | 1
  5. 公司A | bob@example.com | 3
  6. 公司A | bob@example.com | 2
  7. 公司A | bob@example.com | 2
  8. 公司A | bob@example.com | 2
  9. 公司A | bob@example.com | 3
  10. 公司B | bob@example.com | 3
  11. 公司B | bob@example.com | 3
  12. 公司A | tam@example.com | 1
  13. 公司A | tam@example.com | 3
  14. 公司A | tam@example.com | 3
  15. 公司A | tam@example.com | 2

删除重复行后,结果应该如下:

  1. customer_username | created_by | post_id
  2. -------------------+-------------------+---------
  3. 公司A | bob@example.com | 1
  4. 公司A | bob@example.com | 3
  5. 公司A | bob@example.com | 2
  6. 公司B | bob@example.com | 3
  7. 公司A | tam@example.com | 1
  8. 公司A | tam@example.com | 3
  9. 公司A | tam@example.com | 2

所以,我想保留唯一的post_id,并删除所有具有相同post_id的重复行,以保持相同customer_username和相同created_by。

英文:

In postgres, I have two tables: blog and blog_history. blog_history is to keep track of users and which blogs they read.
Tables structures are as follows:

blog table:

  1. CREATE TABLE blog (
  2. id serial PRIMARY KEY,
  3. title text,
  4. description text,
  5. body text,
  6. created_at timestamp without time zone,
  7. );

blog_history table:

  1. CREATE TABLE blog_history (
  2. customer_username text NOT NULL,
  3. created_by text NOT NULL,
  4. created_at timestamp without time zone,
  5. post_id integer,
  6. CONSTRAINT fk_post_id
  7. FOREIGN KEY(post_id)
  8. REFERENCES blog(id)
  9. );

I have some duplicate rows in blog_history table that has the same post_id which is unnecessary.
I want to remove all duplicate rows by keeping one distinct for each created_by and each customer_username.
Example:

  1. SELECT customer_username, created_by, post_id from blog_history;
  2. customer_username | created_by | post_id
  3. -------------------+-------------------+---------
  4. companyA | bob@example.com | 1
  5. companyA | bob@example.com | 3
  6. companyA | bob@example.com | 2
  7. companyA | bob@example.com | 2
  8. companyA | bob@example.com | 2
  9. companyA | bob@example.com | 3
  10. companyB | bob@example.com | 3
  11. companyB | bob@example.com | 3
  12. companyA | tam@example.com | 1
  13. companyA | tam@example.com | 3
  14. companyA | tam@example.com | 3
  15. companyA | tam@example.com | 2

After deleting duplicates, result should be like this:

  1. customer_username | created_by | post_id
  2. -------------------+-------------------+---------
  3. companyA | bob@example.com | 1
  4. companyA | bob@example.com | 3
  5. companyA | bob@example.com | 2
  6. companyB | bob@example.com | 3
  7. companyA | tam@example.com | 1
  8. companyA | tam@example.com | 3
  9. companyA | tam@example.com | 2

So, I want to leave only one distinct post_id and delete all duplicate ones with the same post_id for same customer_username and same created_by.

答案1

得分: 1

你需要一个列来标识每一行。向表中添加主键:

  1. alter table blog_history
  2. add id int generated always as identity primary key;

现在您可以轻松地在重复行的组中识别具有最低 id 的行。

  1. delete from blog_history
  2. where id not in (
  3. select distinct on (customer_username, created_by, post_id) id
  4. from blog_history
  5. order by customer_username, created_by, post_id, id
  6. );

db<>fiddle 中进行测试。

英文:

You need a column that would identify each row. Add the primary key to the table:

  1. alter table blog_history
  2. add id int generated always as identity primary key;

Now you can easily identify rows with the lowest id in groups of duplicated rows.

  1. delete from blog_history
  2. where id not in (
  3. select distinct on (customer_username, created_by, post_id) id
  4. from blog_history
  5. order by customer_username, created_by, post_id, id
  6. );

Test it in db<>fiddle.

答案2

得分: 1

以下是您提供的SQL查询的翻译部分:

  1. 删除
  2. t
  3. 其中 ctid
  4. (
  5. 选择 ctid
  6. (选择 *, ctid, row_number() over(partition by customer_username,created_by,post_id order by post_id) as rn
  7. t
  8. ) t2
  9. 其中 rn > 1
  10. );
  11. 选择 *
  12. t

表格数据和"Fiddle"链接保持不变。

英文:
  1. delete
  2. from t
  3. where ctid in
  4. (
  5. select ctid
  6. from
  7. (select *, ctid, row_number() over(partition by customer_username,created_by,post_id order by post_id) as rn
  8. from t
  9. ) t2
  10. where rn &gt; 1
  11. );
  12. select *
  13. from t
customer_username created_by post_id
companyA bob@example.com 1
companyA bob@example.com 3
companyA bob@example.com 2
companyB bob@example.com 3
companyA tam@example.com 1
companyA tam@example.com 3
companyA tam@example.com 2

Fiddle

huangapple
  • 本文由 发表于 2023年4月19日 23:46:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76056482.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定