2023年5月17日 22:54:53go评论64阅读模式

英文:

Suggestions for how to retrieve primary keys for all rows with matching data (many columns)

问题

以下是您提供的内容的中文翻译：

我有一个SQL Server流程，为了节省存储空间，我们需要为每个唯一记录存储一行，并使用行的主键引用唯一数据集。

示例情景：给定一个类似以下的SQL Server表：

DataTable 
(
    DataID bigint, 
    name nvarchar(100), 
    birthday date, 
    value1 nvarchar(100), 
    value2 nvarchar(100)
)

注意：实际上有大约30列。

当接收到新数据时，我们需要将新数据组合添加到DataTable中，并检索已接收到的所有行的DataID，包括新数据和现有数据。

一个想法是利用EXCEPT，然后也许是INTERSECT，但我无法想出一种从已识别的行中获取ID的方法。

这个需求的更大背景是，我们有另一个表，用于跟踪接收到的所有数据，并且我们希望每天跟踪接收到哪些唯一数据集，而不必存储多份数据的副本。

数据跟踪表看起来类似于这样：

ReceivedData (ID bigint, ReceivedDate datetime, DataID bigint)

我目前脑海中的这个需求的流程可能看起来像这样：

给定一个新的数据表和前面提到的现有DataTable：

NewDataTable 
(
    name nvarchar(100), 
    birthday date, 
    value1 nvarchar(100), 
    value2 nvarchar(100)
)

INSERT INTO DataTable (name, birthday, value1, value2)
 	SELECT name, birthday, value1, value2 
    FROM NewDataTable
	EXCEPT
	SELECT name, birthday, value1, value2 
    FROM DataTable

我们可能可以在这里使用OUTPUT子句来获取新记录创建的DataID，但我们还需要DataID，用于匹配DataTable中现有记录的NewDataTable中的记录。

我知道我可以使用NewDataTable和现有DataTable之间的联接，使用所有表列，但这将是一个重型联接，我希望有更好的方法。

感谢您的所有时间和想法。

示例数据演示

现有[DataTable]记录：

DataID	name	birthday	value1	value2
1	John	1/1/1990	abc	123
2	Sally	2/2/1990	def	321
3	Steve	3/3/1990	ghi	333

现有[ReceivedData]记录：

ID	RecievedDate	DataID
100	5/1/2023 03:01	1
101	5/1/2023 03:01	2
102	5/1/2023 03:01	3

然后，如果在“5/2/2023 03:00”接收到新数据，如下所示：

在[NewDataTable]中接收到的新数据

name	birthday	value1	value2
John	1/1/1990	abc	123
Steve	3/3/1990	ghi	333
Alice	4/3/1990	aaa	222

经过流程运行后，所需的数据状态如下：

生成的[DataTable]记录：

DataID	name	birthday	value1	value2
1	John	1/1/1990	abc	123
2	Sally	2/2/1990	def	321
3	Steve	3/3/1990	ghi	333
4	Alice	4/3/1990	aaa	222

生成的[ReceivedData]记录：

ID	RecievedDate	DataID
100	5/1/2023 03:01	1
101	5/1/2023 03:01	2
102	5/1/2023 03:01	3
103	5/2/2023 03:00	1
104	5/2/2023 03:00	3
105	5/2/2023 03:00	4

英文:

I have a SQL Server process where to save storage we need to store one row per unique record and reference the unique data set using the row's primary key.

Example scenario: given a SQL Server table similar to the following:

DataTable 
(
    DataID bigint, 
    name nvarchar(100), 
    birthday date, 
    value1 nvarchar(100), 
    value2 nvarchar(100)
)

Note: in reality there are ~30 columns

When new data is received we need to add the new data combinations to the DataTable and retrieve the DataIDs for all rows that were received, both new and existing.

One idea was to leverage EXCEPT and then perhaps INTERSECT, but I am unable to think of a way to get the IDs back from the rows identified.

The bigger picture for this need is that we have another table which is used to track all data received and we want to keep track of which unique data sets were received each day without having to store multiple copies of the data.

The data tracking table looks something like this:

ReceivedData (ID bigint, ReceivedDate datetime, DataID bigint)

The process currently in my head for this need could look something like this:

Given a new data in a table and the previously mentioned existing DataTable:

NewDataTable 
(
    name nvarchar(100), 
    birthday date, 
    value1 nvarchar(100), 
    value2 nvarchar(100)
)

INSERT INTO DataTable (name, birthday, value1, value2)
 	SELECT name, birthday, value1, value2 
    FROM NewDataTable
	EXCEPT
	SELECT name, birthday, value1, value2 
    FROM DataTable

We could probably use an OUTPUT clause here to get the DataIDs for the new records created, but we also need the DataIDs for records in the NewDataTable which match existing records in DataTable.

I know I could use join between the NewDataTable and the existing DataTable using all table columns, but this would be a heavy join and I'm hoping for a better way.

Thank you for all your time and ideas.

Example Data Walkthrough

Existing [DataTable] records:

DataID	name	birthday	value1	value2
1	John	1/1/1990	abc	123
2	Sally	2/2/1990	def	321
3	Steve	3/3/1990	ghi	333

Existing [ReceivedData] records:

ID	RecievedDate	DataID
100	5/1/2023 03:01	1
101	5/1/2023 03:01	2
102	5/1/2023 03:01	3

Then, if new data is received on "5/2/2023 03:00" as follows:

New data received in [NewDataTable]

name	birthday	value1	value2
John	1/1/1990	abc	123
Steve	3/3/1990	ghi	333
Alice	4/3/1990	aaa	222

The desired data state after the process runs would be:

Resulting [DataTable] records:

DataID	name	birthday	value1	value2
1	John	1/1/1990	abc	123
2	Sally	2/2/1990	def	321
3	Steve	3/3/1990	ghi	333
4	Alice	4/3/1990	aaa	222

Resulting [ReceivedData] records:

ID	RecievedDate	DataID
100	5/1/2023 03:01	1
101	5/1/2023 03:01	2
102	5/1/2023 03:01	3
103	5/2/2023 03:00	1
104	5/2/2023 03:00	3
105	5/2/2023 03:00	4

答案1

得分: 0

以下是您提供的内容的中文翻译：

-- 数据模型
创建表格 DataTable(DataId int primary key identity,name nvarchar(100),birthday date
,value1 nvarchar(100),value2 nvarchar(100));
启用标识插入 DataTable;
插入数据到 DataTable (DataId,name,birthday,value1,value2) 值
 (1,'John','1990-01-01','abc','123')
,(2,'Sally','1990-03-03','def','321')
,(3,'Steve','1990-03-03','ghi','333')
;
关闭标识插入 DataTable;

创建表格 NewDataTable(name nvarchar(100),birthday date
,value1 nvarchar(100),value2 nvarchar(100));
插入数据到 NewDataTable 值
 ('John','1990-01-01','abc','123')
,('Steve','1990-03-03','ghi','333')
,('Alice','1990-04-03','aaa','222')
;
创建表格 LogTable(id int identity,actiondate datetime,action varchar(10),DataId int ,name nvarchar(100),birthday date
,value1 nvarchar(100),value2 nvarchar(100));

-- 查询
合并 DataTable
使用 NewDataTable n on n.name=DataTable.name -- ...等等
当不匹配时
  插入(name,birthday,value1,value2) 值(n.name,n.birthday,n.value1,n.value2)
当匹配时
   更新设置 DataTable.name=n.name
输出  
       getdate(),$action,插入的.*
到 LogTable;

-- 结果
选择 * from DataTable;
选择 * From LogTable;

查询后的 DataTable 表格

DataId	name	birthday	value1	value2
1	John	1990-01-01	abc	123
2	Sally	1990-03-03	def	321
3	Steve	1990-03-03	ghi	333
4	Alice	1990-04-03	aaa	222

查询后的 LogTable 表格

id	actiondate	action	DataId	name	birthday	value1	value2
1	2023-05-17 20:17:13.520	INSERT	4	Alice	1990-04-03	aaa	222
2	2023-05-17 20:17:13.520	UPDATE	1	John	1990-01-01	abc	123
3	2023-05-17 20:17:13.520	UPDATE	3	Steve	1990-03-03	ghi	333

示例链接

还有执行计划 (https://dbfiddle.uk/yXVgKV92)


<details>
<summary>英文:</summary>

Try this

-- data model
create table DataTable(DataId int primary key identity,name nvarchar(100),birthday date
,value1 nvarchar(100),value2 nvarchar(100));
set identity_insert DataTable ON;
insert into DataTable (DataId,name,birthday,value1,value2) values
(1,'John','1990-01-01','abc','123')
,(2,'Sally','1990-03-03','def','321')
,(3,'Steve','1990-03-03','ghi','333')
;
set identity_insert DataTable OFF;

create table NewDataTable(name nvarchar(100),birthday date
,value1 nvarchar(100),value2 nvarchar(100));
insert into NewDataTable values
('John','1990-01-01','abc','123')
,('Steve','1990-03-03','ghi','333')
,('Alice','1990-04-03','aaa','222')
;
create table LogTable(id int identity,actiondate datetime,action varchar(10),DataId int ,name nvarchar(100),birthday date
,value1 nvarchar(100),value2 nvarchar(100));

-- query
merge DataTable
using NewDataTable n on n.name=DataTable.name -- ... etc
when not matched then
insert(name,birthday,value1,value2) values(n.name,n.birthday,n.value1,n.value2)
when matched then
update set DataTable.name=n.name
output
getdate(),$action,inserted.*
into LogTable;

-- results
select * from DataTable;
select * From LogTable;

DataTable after query  

|DataId|name|birthday|value1|	value2|
|-:|:----|:-----------|:--|:--|
|1|	John|	1990-01-01|	abc|	123|
|2|	Sally|	1990-03-03|	def|	321|
|3|	Steve|	1990-03-03|	ghi|	333|
|4|	Alice|	1990-04-03|	aaa|	222|

LogTable after query  

|id|actiondate|	action|	DataId|	name|    birthday|value1|	value2|
|-:|:----------------------|:-----|-:|:----|:---------|:---|:---|
|1|	2023-05-17 20:17:13.520|INSERT|	4|Alice|1990-04-03|	aaa|	222|
|2|	2023-05-17 20:17:13.520|UPDATE|	1|John|	1990-01-01|	abc|	123|
|3|	2023-05-17 20:17:13.520|UPDATE|	3|Steve|1990-03-03|	ghi|	333|

[Example here](https://dbfiddle.uk/Pl7IyRYR)     
   
and with plan (https://dbfiddle.uk/yXVgKV92)

</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何检索具有匹配数据的所有行的主键（多列）的建议：

问题

示例数据演示

Example Data Walkthrough

答案1

如何查找以”INPUT”开头并包含X个字符实例的Postgres列中的值？

使用dense_rank()对选择的结果集进行排序。

可以在MySQL中使用多个AND条件或在分组中使用WHERE吗？

如何在MySQL中转换月份

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论