2023年4月19日 23:46:27go评论143阅读模式

英文:

Postgresql - Removing duplicate rows per each user and customer

问题

在Postgres中，我有两个表：blog和blog_history。blog_history用于跟踪用户阅读的博客。

blog表：

CREATE TABLE blog (
    id serial PRIMARY KEY,
    title text,
    description text,
    body text,
    created_at timestamp without time zone,
);

blog_history表：

CREATE TABLE blog_history (
    customer_username text NOT NULL,
    created_by text NOT NULL,
    created_at timestamp without time zone,
    post_id integer,
    CONSTRAINT fk_post_id
        FOREIGN KEY(post_id) 
        REFERENCES blog(id)
);

在blog_history表中有一些重复的行，具有相同的post_id，这是不必要的。
我想通过保留每个created_by和每个customer_username的一个不同行来删除所有重复行。
示例：

SELECT customer_username, created_by, post_id from blog_history;

 customer_username |    created_by     | post_id 
-------------------+-------------------+---------
 公司A                | bob@example.com |       1
 公司A                | bob@example.com |       3
 公司A                | bob@example.com |       2
 公司A                | bob@example.com |       2
 公司A                | bob@example.com |       2
 公司A                | bob@example.com |       3
 公司B                | bob@example.com |       3
 公司B                | bob@example.com |       3
 公司A                | tam@example.com |       1
 公司A                | tam@example.com |       3
 公司A                | tam@example.com |       3
 公司A                | tam@example.com |       2

删除重复行后，结果应该如下：

 customer_username |    created_by     | post_id 
-------------------+-------------------+---------
 公司A                | bob@example.com |       1
 公司A                | bob@example.com |       3
 公司A                | bob@example.com |       2
 公司B                | bob@example.com |       3
 公司A                | tam@example.com |       1
 公司A                | tam@example.com |       3
 公司A                | tam@example.com |       2

所以，我想保留唯一的post_id，并删除所有具有相同post_id的重复行，以保持相同customer_username和相同created_by。

英文:

In postgres, I have two tables: blog and blog_history. blog_history is to keep track of users and which blogs they read.
Tables structures are as follows:

blog table:

CREATE TABLE blog (
    id serial PRIMARY KEY,
    title text,
    description text,
    body text,
    created_at timestamp without time zone,
);

blog_history table:

CREATE TABLE blog_history (
    customer_username text NOT NULL,
    created_by text NOT NULL,
    created_at timestamp without time zone,
    post_id integer,
    CONSTRAINT fk_post_id
        FOREIGN KEY(post_id) 
        REFERENCES blog(id)
);

I have some duplicate rows in blog_history table that has the same post_id which is unnecessary.
I want to remove all duplicate rows by keeping one distinct for each created_by and each customer_username.
Example:

SELECT customer_username, created_by, post_id from blog_history;

 customer_username |    created_by     | post_id 
-------------------+-------------------+---------
 companyA             | bob@example.com |       1
 companyA             | bob@example.com |       3
 companyA             | bob@example.com |       2
 companyA             | bob@example.com |       2
 companyA             | bob@example.com |       2
 companyA             | bob@example.com |       3
 companyB             | bob@example.com |       3
 companyB             | bob@example.com |       3
 companyA             | tam@example.com |       1
 companyA             | tam@example.com |       3
 companyA             | tam@example.com |       3
 companyA             | tam@example.com |       2

After deleting duplicates, result should be like this:

 customer_username |    created_by     | post_id 
-------------------+-------------------+---------
 companyA             | bob@example.com |       1
 companyA             | bob@example.com |       3
 companyA             | bob@example.com |       2
 companyB             | bob@example.com |       3
 companyA             | tam@example.com |       1
 companyA             | tam@example.com |       3
 companyA             | tam@example.com |       2

So, I want to leave only one distinct post_id and delete all duplicate ones with the same post_id for same customer_username and same created_by.

答案1

得分: 1

你需要一个列来标识每一行。向表中添加主键：

alter table blog_history
	add id int generated always as identity primary key;

现在您可以轻松地在重复行的组中识别具有最低 id 的行。

delete from blog_history
where id not in (
	select distinct on (customer_username, created_by, post_id) id
	from blog_history
	order by customer_username, created_by, post_id, id
	);

在 db<>fiddle 中进行测试。

英文:

You need a column that would identify each row. Add the primary key to the table:

alter table blog_history
	add id int generated always as identity primary key;

Now you can easily identify rows with the lowest id in groups of duplicated rows.

delete from blog_history
where id not in (
	select distinct on (customer_username, created_by, post_id) id
	from blog_history
	order by customer_username, created_by, post_id, id
	);

Test it in db<>fiddle.

答案2

得分: 1

以下是您提供的SQL查询的翻译部分：

删除
从 t
其中 ctid 在
(
选择 ctid
从
(选择 *, ctid, row_number() over(partition by customer_username,created_by,post_id order by post_id) as rn
从   t
) t2
其中 rn > 1
);

选择 *
从   t

表格数据和"Fiddle"链接保持不变。

英文:

delete 
from t
where ctid in
(
select ctid
from
(select *, ctid, row_number() over(partition by customer_username,created_by,post_id order by post_id) as rn
from   t
) t2
where rn &gt; 1
);

select *
from   t

customer_username	created_by	post_id
companyA	bob@example.com	1
companyA	bob@example.com	3
companyA	bob@example.com	2
companyB	bob@example.com	3
companyA	tam@example.com	1
companyA	tam@example.com	3
companyA	tam@example.com	2

Fiddle

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

PostgreSQL – 每个用户和客户去除重复行

问题

答案1

答案2

PostgreSQL连接和WHERE子句执行顺序

数据库连接失败，位于age-viewer。

column reference "group_id" is ambiguous when join

如果两列都为Yes，那么在SQL中新列应为Yes。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论