SQL Server从非规范化表迁移到规范化表导致性能问题。

huangapple go评论87阅读模式
英文:

SQL Server migration from denormalized to normalized table causing performance issues

问题

We have a denormalized table in our SQL Server data model that we want to migrate to a normalized format. However, the query we have created for this migration is causing a significant delay. We have checked the actual execution plan and found that the estimated row count is much higher than the actual row count.

Old Table Data Model:

Column A (PK Column): Long
Column1: String
Column2: String
Column3: String

New table data model:

Column A (PK Column): Long
Column1: Long (FK value by referencing another table)
Column2: Long (FK value by referencing another table)
Column3: Long (FK value by referencing another table)

Batch size: oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10....1000)

Problem:

The query we have created for the migration from the denormalized to the normalized table is causing a significant delay. We have checked the actual execution plan and found that the estimated row count is much higher than the actual row count. We have already updated the statistics of the tables involved in the query and rebuilt the indexes of the column used in the query.

Question:

Is there any way to increase the performance of the migration in this scenario?

Additional information:

Query Plan: Link

英文:

We have a denormalized table in our SQL Server data model that we want to migrate to a normalized format. However, the query we have created for this migration is causing a significant delay. We have checked the actual execution plan and found that the estimated row count is much higher than the actual row count.

Old Table Data Model:

Column A (PK Column): Long
Column1: String
Column2: String
Column3: String

New table data model:

Column A (PK Column): Long
Column1: Long (FK value by referencing another table)
Column2: Long (FK value by referencing another table)
Column3: Long (FK value by referencing another table)
insert into newTable (pkcolumn, column1, column2, column3)
select 
    pkColumn,
    anothertable1.id,
    anothertable2.id,
    anothertable3.id
left join 
    anothertable as anothertable1 on anothertable1.field_id = 1 
                                  and anothertable1.value = oldtable.column1 
                                  and oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10)
left join 
    anothertable as anothertable2 on anothertable2.field_id = 2 
                                  and anothertable2.value = oldtable.column2 
                                  and oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10)
left join 
    anothertable as anothertable3 on anothertable3.field_id = 3 
                                  and anothertable3.value = oldtable.column3 
                                  and oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10)

Batch size: oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10....1000)

Problem

The query we have created for the migration from the denormalized to the normalized table is causing a significant delay. We have checked the actual execution plan and found that the estimated row count is much higher than the actual row count. We have already updated the statistics of the tables involved in the query and rebuilt the indexes of the column used in the query.

Question

Is there any way to increase the performance of the migration in this scenario?

Additional information:

Query Plan: Link

答案1

得分: 0

基于您的回答,非规范化的表格每个有250列和50万行,我认为任何索引设计并尝试在单个SQL语句中执行都会很困难。所以您的选择是:

  1. 以编程批处理方式在SQL中运行它 - 如果数据是静态的,并且有适当的索引和统计信息,那么这可能是可行的。

  2. 这是我的选择 - 使用ETL工具来执行此操作 - 或者甚至使用Spark/Databricks,如果在云中进行,则可以使用基于云的工具。

英文:

Based on your answer that the denormalized tables have 250 columns and 0.5 million rows each, in my opinion any index design and trying to do it in a single SQL statement will be difficult. So your options are

  1. Run it in programmatic batch way in SQL - with proper index and statistics if it is managable and source data is static

  2. This is my choice - use an ETL tool - to do this - or even use spark/databricks or if this is in cloud any cloud based tool

答案2

得分: 0

首先,迁移数据后,可以创建外键(FK)。这将减缓速度。

其次,可以将JOIN重写为:

ON anothertable1.field_id = 1 
   and anothertable1.value = oldtable.column1 
   and oldtable.pkColumn BETWEEN 1 AND 10

但我没有看到你的表进行任何规范化。

这对我来说看起来像这个帖子:

Best index for table with dynamic predicate
我已经回答了。

英文:

First thing, you can create FK after having migrates the data. This will reduce the speed.

Second, you can rewrite your JOINs as :

ON anothertable1.field_id = 1 
   and anothertable1.value = oldtable.column1 
   and oldtable.pkColumn BETWEEN 1 AND 10

But I did not see any normalization of your table.

This seems to me like this post :

Best index for table with dynamic predicate
Which I already answer

huangapple
  • 本文由 发表于 2023年5月15日 11:30:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76250729.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定