英文:
Suggestions for how to retrieve primary keys for all rows with matching data (many columns)
问题
以下是您提供的内容的中文翻译:
我有一个SQL Server流程,为了节省存储空间,我们需要为每个唯一记录存储一行,并使用行的主键引用唯一数据集。
示例情景:给定一个类似以下的SQL Server表:
DataTable
(
DataID bigint,
name nvarchar(100),
birthday date,
value1 nvarchar(100),
value2 nvarchar(100)
)
注意:实际上有大约30列。
当接收到新数据时,我们需要将新数据组合添加到DataTable中,并检索已接收到的所有行的DataID,包括新数据和现有数据。
一个想法是利用EXCEPT,然后也许是INTERSECT,但我无法想出一种从已识别的行中获取ID的方法。
这个需求的更大背景是,我们有另一个表,用于跟踪接收到的所有数据,并且我们希望每天跟踪接收到哪些唯一数据集,而不必存储多份数据的副本。
数据跟踪表看起来类似于这样:
ReceivedData (ID bigint, ReceivedDate datetime, DataID bigint)
我目前脑海中的这个需求的流程可能看起来像这样:
给定一个新的数据表和前面提到的现有DataTable:
NewDataTable
(
name nvarchar(100),
birthday date,
value1 nvarchar(100),
value2 nvarchar(100)
)
INSERT INTO DataTable (name, birthday, value1, value2)
SELECT name, birthday, value1, value2
FROM NewDataTable
EXCEPT
SELECT name, birthday, value1, value2
FROM DataTable
我们可能可以在这里使用OUTPUT子句来获取新记录创建的DataID,但我们还需要DataID,用于匹配DataTable中现有记录的NewDataTable中的记录。
我知道我可以使用NewDataTable和现有DataTable之间的联接,使用所有表列,但这将是一个重型联接,我希望有更好的方法。
感谢您的所有时间和想法。
示例数据演示
现有[DataTable]记录:
DataID | name | birthday | value1 | value2 |
---|---|---|---|---|
1 | John | 1/1/1990 | abc | 123 |
2 | Sally | 2/2/1990 | def | 321 |
3 | Steve | 3/3/1990 | ghi | 333 |
现有[ReceivedData]记录:
ID | RecievedDate | DataID |
---|---|---|
100 | 5/1/2023 03:01 | 1 |
101 | 5/1/2023 03:01 | 2 |
102 | 5/1/2023 03:01 | 3 |
然后,如果在“5/2/2023 03:00”接收到新数据,如下所示:
在[NewDataTable]中接收到的新数据
name | birthday | value1 | value2 |
---|---|---|---|
John | 1/1/1990 | abc | 123 |
Steve | 3/3/1990 | ghi | 333 |
Alice | 4/3/1990 | aaa | 222 |
经过流程运行后,所需的数据状态如下:
生成的[DataTable]记录:
DataID | name | birthday | value1 | value2 |
---|---|---|---|---|
1 | John | 1/1/1990 | abc | 123 |
2 | Sally | 2/2/1990 | def | 321 |
3 | Steve | 3/3/1990 | ghi | 333 |
4 | Alice | 4/3/1990 | aaa | 222 |
生成的[ReceivedData]记录:
ID | RecievedDate | DataID |
---|---|---|
100 | 5/1/2023 03:01 | 1 |
101 | 5/1/2023 03:01 | 2 |
102 | 5/1/2023 03:01 | 3 |
103 | 5/2/2023 03:00 | 1 |
104 | 5/2/2023 03:00 | 3 |
105 | 5/2/2023 03:00 | 4 |
英文:
I have a SQL Server process where to save storage we need to store one row per unique record and reference the unique data set using the row's primary key.
Example scenario: given a SQL Server table similar to the following:
DataTable
(
DataID bigint,
name nvarchar(100),
birthday date,
value1 nvarchar(100),
value2 nvarchar(100)
)
Note: in reality there are ~30 columns
When new data is received we need to add the new data combinations to the DataTable and retrieve the DataIDs for all rows that were received, both new and existing.
One idea was to leverage EXCEPT and then perhaps INTERSECT, but I am unable to think of a way to get the IDs back from the rows identified.
The bigger picture for this need is that we have another table which is used to track all data received and we want to keep track of which unique data sets were received each day without having to store multiple copies of the data.
The data tracking table looks something like this:
ReceivedData (ID bigint, ReceivedDate datetime, DataID bigint)
The process currently in my head for this need could look something like this:
Given a new data in a table and the previously mentioned existing DataTable:
NewDataTable
(
name nvarchar(100),
birthday date,
value1 nvarchar(100),
value2 nvarchar(100)
)
INSERT INTO DataTable (name, birthday, value1, value2)
SELECT name, birthday, value1, value2
FROM NewDataTable
EXCEPT
SELECT name, birthday, value1, value2
FROM DataTable
We could probably use an OUTPUT clause here to get the DataIDs for the new records created, but we also need the DataIDs for records in the NewDataTable which match existing records in DataTable.
I know I could use join between the NewDataTable and the existing DataTable using all table columns, but this would be a heavy join and I'm hoping for a better way.
Thank you for all your time and ideas.
Example Data Walkthrough
Existing [DataTable] records:
DataID | name | birthday | value1 | value2 |
---|---|---|---|---|
1 | John | 1/1/1990 | abc | 123 |
2 | Sally | 2/2/1990 | def | 321 |
3 | Steve | 3/3/1990 | ghi | 333 |
Existing [ReceivedData] records:
ID | RecievedDate | DataID |
---|---|---|
100 | 5/1/2023 03:01 | 1 |
101 | 5/1/2023 03:01 | 2 |
102 | 5/1/2023 03:01 | 3 |
Then, if new data is received on "5/2/2023 03:00" as follows:
New data received in [NewDataTable]
name | birthday | value1 | value2 |
---|---|---|---|
John | 1/1/1990 | abc | 123 |
Steve | 3/3/1990 | ghi | 333 |
Alice | 4/3/1990 | aaa | 222 |
The desired data state after the process runs would be:
Resulting [DataTable] records:
DataID | name | birthday | value1 | value2 |
---|---|---|---|---|
1 | John | 1/1/1990 | abc | 123 |
2 | Sally | 2/2/1990 | def | 321 |
3 | Steve | 3/3/1990 | ghi | 333 |
4 | Alice | 4/3/1990 | aaa | 222 |
Resulting [ReceivedData] records:
ID | RecievedDate | DataID |
---|---|---|
100 | 5/1/2023 03:01 | 1 |
101 | 5/1/2023 03:01 | 2 |
102 | 5/1/2023 03:01 | 3 |
103 | 5/2/2023 03:00 | 1 |
104 | 5/2/2023 03:00 | 3 |
105 | 5/2/2023 03:00 | 4 |
答案1
得分: 0
以下是您提供的内容的中文翻译:
-- 数据模型
创建表格 DataTable(DataId int primary key identity,name nvarchar(100),birthday date
,value1 nvarchar(100),value2 nvarchar(100));
启用标识插入 DataTable;
插入数据到 DataTable (DataId,name,birthday,value1,value2) 值
(1,'John','1990-01-01','abc','123')
,(2,'Sally','1990-03-03','def','321')
,(3,'Steve','1990-03-03','ghi','333')
;
关闭标识插入 DataTable;
创建表格 NewDataTable(name nvarchar(100),birthday date
,value1 nvarchar(100),value2 nvarchar(100));
插入数据到 NewDataTable 值
('John','1990-01-01','abc','123')
,('Steve','1990-03-03','ghi','333')
,('Alice','1990-04-03','aaa','222')
;
创建表格 LogTable(id int identity,actiondate datetime,action varchar(10),DataId int ,name nvarchar(100),birthday date
,value1 nvarchar(100),value2 nvarchar(100));
-- 查询
合并 DataTable
使用 NewDataTable n on n.name=DataTable.name -- ...等等
当不匹配时
插入(name,birthday,value1,value2) 值(n.name,n.birthday,n.value1,n.value2)
当匹配时
更新设置 DataTable.name=n.name
输出
getdate(),$action,插入的.*
到 LogTable;
-- 结果
选择 * from DataTable;
选择 * From LogTable;
查询后的 DataTable 表格
DataId | name | birthday | value1 | value2 |
---|---|---|---|---|
1 | John | 1990-01-01 | abc | 123 |
2 | Sally | 1990-03-03 | def | 321 |
3 | Steve | 1990-03-03 | ghi | 333 |
4 | Alice | 1990-04-03 | aaa | 222 |
查询后的 LogTable 表格
id | actiondate | action | DataId | name | birthday | value1 | value2 |
---|---|---|---|---|---|---|---|
1 | 2023-05-17 20:17:13.520 | INSERT | 4 | Alice | 1990-04-03 | aaa | 222 |
2 | 2023-05-17 20:17:13.520 | UPDATE | 1 | John | 1990-01-01 | abc | 123 |
3 | 2023-05-17 20:17:13.520 | UPDATE | 3 | Steve | 1990-03-03 | ghi | 333 |
还有执行计划 (https://dbfiddle.uk/yXVgKV92)
<details>
<summary>英文:</summary>
Try this
-- data model
create table DataTable(DataId int primary key identity,name nvarchar(100),birthday date
,value1 nvarchar(100),value2 nvarchar(100));
set identity_insert DataTable ON;
insert into DataTable (DataId,name,birthday,value1,value2) values
(1,'John','1990-01-01','abc','123')
,(2,'Sally','1990-03-03','def','321')
,(3,'Steve','1990-03-03','ghi','333')
;
set identity_insert DataTable OFF;
create table NewDataTable(name nvarchar(100),birthday date
,value1 nvarchar(100),value2 nvarchar(100));
insert into NewDataTable values
('John','1990-01-01','abc','123')
,('Steve','1990-03-03','ghi','333')
,('Alice','1990-04-03','aaa','222')
;
create table LogTable(id int identity,actiondate datetime,action varchar(10),DataId int ,name nvarchar(100),birthday date
,value1 nvarchar(100),value2 nvarchar(100));
-- query
merge DataTable
using NewDataTable n on n.name=DataTable.name -- ... etc
when not matched then
insert(name,birthday,value1,value2) values(n.name,n.birthday,n.value1,n.value2)
when matched then
update set DataTable.name=n.name
output
getdate(),$action,inserted.*
into LogTable;
-- results
select * from DataTable;
select * From LogTable;
DataTable after query
|DataId|name|birthday|value1| value2|
|-:|:----|:-----------|:--|:--|
|1| John| 1990-01-01| abc| 123|
|2| Sally| 1990-03-03| def| 321|
|3| Steve| 1990-03-03| ghi| 333|
|4| Alice| 1990-04-03| aaa| 222|
LogTable after query
|id|actiondate| action| DataId| name| birthday|value1| value2|
|-:|:----------------------|:-----|-:|:----|:---------|:---|:---|
|1| 2023-05-17 20:17:13.520|INSERT| 4|Alice|1990-04-03| aaa| 222|
|2| 2023-05-17 20:17:13.520|UPDATE| 1|John| 1990-01-01| abc| 123|
|3| 2023-05-17 20:17:13.520|UPDATE| 3|Steve|1990-03-03| ghi| 333|
[Example here](https://dbfiddle.uk/Pl7IyRYR)
and with plan (https://dbfiddle.uk/yXVgKV92)
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论