2023年5月22日 02:07:34go评论71阅读模式

英文:

SQL-Could you help me writing a SQL query which remakes new table from existing table without duplicated ID?

问题

Sure, here's the SQL query to create a new table from the existing one without duplicated IDs while preserving the latest data:

CREATE TABLE new_table AS
SELECT DISTINCT ON (Id)
    Id,
    name,
    visited,
    column1,
    column2
FROM your_existing_table
ORDER BY Id, /* Add any date or timestamp column here to determine the latest */
         /* If no timestamp column, use a row number in the ORDER BY clause */
         /* For example: ROW_NUMBER() OVER (PARTITION BY Id ORDER BY SomeColumn DESC) */
         /* This will rank rows within each Id based on SomeColumn in descending order */
         /* You can replace SomeColumn with the appropriate column name */
         /* Change DESC to ASC if you want the earliest instead of the latest */
         /* You can remove this line if you don't have any timestamp or date-based data */
         /* Example without timestamp: ROW_NUMBER() OVER (PARTITION BY Id) */
         /* It will rank rows within each Id without considering any specific order */

-- Optional: Drop the old table
-- DROP TABLE your_existing_table;

-- Optional: Rename the new table to match the old one
-- ALTER TABLE new_table RENAME TO your_existing_table;

This query will create a new table called new_table without duplicated IDs and preserving the latest data based on the specified order criteria. You can optionally drop the old table and rename the new table to match the old one, as indicated in your original query.

英文:

Could you help me write a SQL query which remakes a new table from existing table without duplicated ID?

I want to store most new data among all duplicated Id.

The table looks like this:

Id	name	visited	column1	column2
xd01s	sam	23	Null	string
sc01t	susan	12	string	string
t01sc	tom	22	Null	Null
xd01s	san	12	string	string

My table (actually tables) is at Amazon Redshift DB. And while I'm storing my data there.

I found out same Id is getting duplicated regardless of primary key.

So I've decided to recreate the table without duplicated data (erasing seems costly).

From the example table, the new table I want would be like this.

Id	name	visited	column1	column2
sc01t	susan	12	string	string
t01sc	tom	22	Null	Null
xd01s	san	12	string	string

Preserving latest data of 'xd01s' and getting rid of other old 'xd01s'.

Any of columns doesn't tell what is most recent one (there isn't time nor date..nor any incremented value).

I think the biggest rownumber from initial order is only way to notice most recent one. But my SQL query I've tried keeps failing (lack of experience..).

So far I've used this query with psycopg2 python package.

alter table my_table rename to my_table_old

create table my_table 
as 
    select distinct * 
    from my_table_old

drop table if exists my_table_old cascade

but this only gets rid of data which has all column values are same.

Great thanks for reading my question and answering it ^^.

答案1

得分: 0

这个查询会执行以下操作：

SELECT
    Id
    ,name
    ,visited
    ,column1
    ,column2
FROM	(
		SELECT
			Id
			,name
			,visited
			,column1
			,column2
			,ROW_NUMBER() OVER (PARTITION BY Id)	AS rn
		FROM
			my_table_old
		)	AS my_table_aux
WHERE
	rn = 1
;

在子查询中，我们创建了一个新的列，为字段 Id 中相同值的重复项分配了顺序整数。

Id	name	visited	column1	column2	rn
xd01s	sam	23	Null	string	1
sc01t	susan	12	string	string	1
t01sc	tom	22	Null	Null	1
xd01s	san	12	string	string	2

外部查询过滤内部结果，只保留在相同 Id 分组内的第一行（因为我们确保每个 Id 至少会有一条记录，其 rn 值为 1）。

Id	name	visited	column1	column2
xd01s	sam	23	Null	string
sc01t	susan	12	string	string
t01sc	tom	22	Null	Null

请注意，结果并不保证保持原始记录的顺序。如果我们需要为 OVER 子句提供 ORDER BY 子句，我们需要一个字段来提供排序。例如，如果有一个名为 updated_on 的日期时间字段，可以按照从新到旧的顺序排序记录。在这种情况下，查询应该如下所示：

SELECT
    Id
    ,name
    ,visited
    ,column1
    ,column2
    ,updated_on
FROM	(
		SELECT
			Id
			,name
			,visited
			,column1
			,column2
			,updated_on
			,ROW_NUMBER() OVER (PARTITION BY Id ORDER BY updated_on DESC)	AS rn
		FROM
			my_table_old
		)	AS my_table_aux
WHERE
	rn = 1
;

你可以在ROW_NUMBER窗口函数的文档中详细了解这个功能。

英文:

This query will do:

SELECT
	Id
	,name
	,visited
	,column1
	,column2
FROM	(
		SELECT
			Id
			,name
			,visited
			,column1
			,column2
			,ROW_NUMBER() OVER (PARTITION BY Id)	AS rn
		FROM
			my_table_old
		)	AS my_table_aux
WHERE
	rn = 1
;

With the subquery we are creating a new column that assigns a sequential integer to each repetition of the same value within the field Id.

Id	name	visited	column1	column2	rn
xd01s	sam	23	Null	string	1
sc01t	susan	12	string	string	1
t01sc	tom	22	Null	Null	1
xd01s	san	12	string	string	2

The outer query filters the inner result to keep just the first of the rows within a group rows with the same Id (as we are sure that there will be at least one record for each Id with a rn value of 1).

Id	name	visited	column1	column2
xd01s	sam	23	Null	string
sc01t	susan	12	string	string
t01sc	tom	22	Null	Null

Note that nothing guaranties that the result will keep the original order of the records. We would need a field to provide an ORDER BY clause for the OVER clause.

Let's suppose you have a datetime field named updated_on that can be used to sort the records from newest to oldest. In that case, the query should be as follows:

SELECT
	Id
	,name
	,visited
	,column1
	,column2
	,updated_on
FROM	(
		SELECT
			Id
			,name
			,visited
			,column1
			,column2
			,updated_on
			,ROW_NUMBER() OVER (PARTITION BY Id ORDER BY updated_on DESC)	AS rn
		FROM
			my_table_old
		)	AS my_table_aux
WHERE
	rn = 1
;

Here you can read the details of ROW_NUMBER window function.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

SQL-Could you help me writing a SQL query which remakes new table from existing table without duplicated ID?

问题

答案1

汇总具有连续日期范围的行。

需要授予用户哪些权限以访问Redshift中公共模式下的表约束？

正则表达式以停在第一个匹配的字符之一

独立的JPA序列

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论