SQL-Could you help me writing a SQL query which remakes new table from existing table without duplicated ID?

huangapple go评论56阅读模式
英文:

SQL-Could you help me writing a SQL query which remakes new table from existing table without duplicated ID?

问题

Sure, here's the SQL query to create a new table from the existing one without duplicated IDs while preserving the latest data:

CREATE TABLE new_table AS
SELECT DISTINCT ON (Id)
    Id,
    name,
    visited,
    column1,
    column2
FROM your_existing_table
ORDER BY Id, /* Add any date or timestamp column here to determine the latest */
         /* If no timestamp column, use a row number in the ORDER BY clause */
         /* For example: ROW_NUMBER() OVER (PARTITION BY Id ORDER BY SomeColumn DESC) */
         /* This will rank rows within each Id based on SomeColumn in descending order */
         /* You can replace SomeColumn with the appropriate column name */
         /* Change DESC to ASC if you want the earliest instead of the latest */
         /* You can remove this line if you don't have any timestamp or date-based data */
         /* Example without timestamp: ROW_NUMBER() OVER (PARTITION BY Id) */
         /* It will rank rows within each Id without considering any specific order */

-- Optional: Drop the old table
-- DROP TABLE your_existing_table;

-- Optional: Rename the new table to match the old one
-- ALTER TABLE new_table RENAME TO your_existing_table;

This query will create a new table called new_table without duplicated IDs and preserving the latest data based on the specified order criteria. You can optionally drop the old table and rename the new table to match the old one, as indicated in your original query.

英文:

Could you help me write a SQL query which remakes a new table from existing table without duplicated ID?

I want to store most new data among all duplicated Id.

The table looks like this:

Id name visited column1 column2
xd01s sam 23 Null string
sc01t susan 12 string string
t01sc tom 22 Null Null
xd01s san 12 string string

My table (actually tables) is at Amazon Redshift DB. And while I'm storing my data there.

I found out same Id is getting duplicated regardless of primary key.

So I've decided to recreate the table without duplicated data (erasing seems costly).

From the example table, the new table I want would be like this.

Id name visited column1 column2
sc01t susan 12 string string
t01sc tom 22 Null Null
xd01s san 12 string string

Preserving latest data of 'xd01s' and getting rid of other old 'xd01s'.

Any of columns doesn't tell what is most recent one (there isn't time nor date..nor any incremented value).

I think the biggest rownumber from initial order is only way to notice most recent one. But my SQL query I've tried keeps failing (lack of experience..).

So far I've used this query with psycopg2 python package.

alter table my_table rename to my_table_old

create table my_table 
as 
    select distinct * 
    from my_table_old

drop table if exists my_table_old cascade

but this only gets rid of data which has all column values are same.

Great thanks for reading my question and answering it ^^.

答案1

得分: 0

这个查询会执行以下操作:

SELECT
    Id
    ,name
    ,visited
    ,column1
    ,column2
FROM	(
		SELECT
			Id
			,name
			,visited
			,column1
			,column2
			,ROW_NUMBER() OVER (PARTITION BY Id)	AS rn
		FROM
			my_table_old
		)	AS my_table_aux
WHERE
	rn = 1
;

在子查询中,我们创建了一个新的列,为字段 Id 中相同值的重复项分配了顺序整数。

Id name visited column1 column2 rn
xd01s sam 23 Null string 1
sc01t susan 12 string string 1
t01sc tom 22 Null Null 1
xd01s san 12 string string 2

外部查询过滤内部结果,只保留在相同 Id 分组内的第一行(因为我们确保每个 Id 至少会有一条记录,其 rn 值为 1)。

Id name visited column1 column2
xd01s sam 23 Null string
sc01t susan 12 string string
t01sc tom 22 Null Null

请注意,结果并不保证保持原始记录的顺序。如果我们需要为 OVER 子句提供 ORDER BY 子句,我们需要一个字段来提供排序。例如,如果有一个名为 updated_on 的日期时间字段,可以按照从新到旧的顺序排序记录。在这种情况下,查询应该如下所示:

SELECT
    Id
    ,name
    ,visited
    ,column1
    ,column2
    ,updated_on
FROM	(
		SELECT
			Id
			,name
			,visited
			,column1
			,column2
			,updated_on
			,ROW_NUMBER() OVER (PARTITION BY Id ORDER BY updated_on DESC)	AS rn
		FROM
			my_table_old
		)	AS my_table_aux
WHERE
	rn = 1
;

你可以在ROW_NUMBER窗口函数的文档中详细了解这个功能。

英文:

This query will do:

SELECT
	Id
	,name
	,visited
	,column1
	,column2
FROM	(
		SELECT
			Id
			,name
			,visited
			,column1
			,column2
			,ROW_NUMBER() OVER (PARTITION BY Id)	AS rn
		FROM
			my_table_old
		)	AS my_table_aux
WHERE
	rn = 1
;

With the subquery we are creating a new column that assigns a sequential integer to each repetition of the same value within the field Id.

Id name visited column1 column2 rn
xd01s sam 23 Null string 1
sc01t susan 12 string string 1
t01sc tom 22 Null Null 1
xd01s san 12 string string 2

The outer query filters the inner result to keep just the first of the rows within a group rows with the same Id (as we are sure that there will be at least one record for each Id with a rn value of 1).

Id name visited column1 column2
xd01s sam 23 Null string
sc01t susan 12 string string
t01sc tom 22 Null Null

Note that nothing guaranties that the result will keep the original order of the records. We would need a field to provide an ORDER BY clause for the OVER clause.

Let's suppose you have a datetime field named updated_on that can be used to sort the records from newest to oldest. In that case, the query should be as follows:

SELECT
	Id
	,name
	,visited
	,column1
	,column2
	,updated_on
FROM	(
		SELECT
			Id
			,name
			,visited
			,column1
			,column2
			,updated_on
			,ROW_NUMBER() OVER (PARTITION BY Id ORDER BY updated_on DESC)	AS rn
		FROM
			my_table_old
		)	AS my_table_aux
WHERE
	rn = 1
;

Here you can read the details of ROW_NUMBER window function.

huangapple
  • 本文由 发表于 2023年5月22日 02:07:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76301270.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定