英文:
SQL Server migration from denormalized to normalized table causing performance issues
问题
We have a denormalized table in our SQL Server data model that we want to migrate to a normalized format. However, the query we have created for this migration is causing a significant delay. We have checked the actual execution plan and found that the estimated row count is much higher than the actual row count.
Old Table Data Model:
Column A (PK Column): Long
Column1: String
Column2: String
Column3: String
New table data model:
Column A (PK Column): Long
Column1: Long (FK value by referencing another table)
Column2: Long (FK value by referencing another table)
Column3: Long (FK value by referencing another table)
Batch size: oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10....1000)
Problem:
The query we have created for the migration from the denormalized to the normalized table is causing a significant delay. We have checked the actual execution plan and found that the estimated row count is much higher than the actual row count. We have already updated the statistics of the tables involved in the query and rebuilt the indexes of the column used in the query.
Question:
Is there any way to increase the performance of the migration in this scenario?
Additional information:
Query Plan: Link
英文:
We have a denormalized table in our SQL Server data model that we want to migrate to a normalized format. However, the query we have created for this migration is causing a significant delay. We have checked the actual execution plan and found that the estimated row count is much higher than the actual row count.
Old Table Data Model:
Column A (PK Column): Long
Column1: String
Column2: String
Column3: String
New table data model:
Column A (PK Column): Long
Column1: Long (FK value by referencing another table)
Column2: Long (FK value by referencing another table)
Column3: Long (FK value by referencing another table)
insert into newTable (pkcolumn, column1, column2, column3)
select
pkColumn,
anothertable1.id,
anothertable2.id,
anothertable3.id
left join
anothertable as anothertable1 on anothertable1.field_id = 1
and anothertable1.value = oldtable.column1
and oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10)
left join
anothertable as anothertable2 on anothertable2.field_id = 2
and anothertable2.value = oldtable.column2
and oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10)
left join
anothertable as anothertable3 on anothertable3.field_id = 3
and anothertable3.value = oldtable.column3
and oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10)
Batch size: oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10....1000)
Problem
The query we have created for the migration from the denormalized to the normalized table is causing a significant delay. We have checked the actual execution plan and found that the estimated row count is much higher than the actual row count. We have already updated the statistics of the tables involved in the query and rebuilt the indexes of the column used in the query.
Question
Is there any way to increase the performance of the migration in this scenario?
Additional information:
Query Plan: Link
答案1
得分: 0
基于您的回答,非规范化的表格每个有250列和50万行,我认为任何索引设计并尝试在单个SQL语句中执行都会很困难。所以您的选择是:
-
以编程批处理方式在SQL中运行它 - 如果数据是静态的,并且有适当的索引和统计信息,那么这可能是可行的。
-
这是我的选择 - 使用ETL工具来执行此操作 - 或者甚至使用Spark/Databricks,如果在云中进行,则可以使用基于云的工具。
英文:
Based on your answer that the denormalized tables have 250 columns and 0.5 million rows each, in my opinion any index design and trying to do it in a single SQL statement will be difficult. So your options are
-
Run it in programmatic batch way in SQL - with proper index and statistics if it is managable and source data is static
-
This is my choice - use an ETL tool - to do this - or even use spark/databricks or if this is in cloud any cloud based tool
答案2
得分: 0
首先,迁移数据后,可以创建外键(FK)。这将减缓速度。
其次,可以将JOIN重写为:
ON anothertable1.field_id = 1
and anothertable1.value = oldtable.column1
and oldtable.pkColumn BETWEEN 1 AND 10
但我没有看到你的表进行任何规范化。
这对我来说看起来像这个帖子:
Best index for table with dynamic predicate
我已经回答了。
英文:
First thing, you can create FK after having migrates the data. This will reduce the speed.
Second, you can rewrite your JOINs as :
ON anothertable1.field_id = 1
and anothertable1.value = oldtable.column1
and oldtable.pkColumn BETWEEN 1 AND 10
But I did not see any normalization of your table.
This seems to me like this post :
Best index for table with dynamic predicate
Which I already answer
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论