2023年5月15日 11:30:36go评论96阅读模式

英文:

SQL Server migration from denormalized to normalized table causing performance issues

问题

We have a denormalized table in our SQL Server data model that we want to migrate to a normalized format. However, the query we have created for this migration is causing a significant delay. We have checked the actual execution plan and found that the estimated row count is much higher than the actual row count.

Old Table Data Model:

Column A (PK Column): Long
Column1: String
Column2: String
Column3: String

New table data model:

Column A (PK Column): Long
Column1: Long (FK value by referencing another table)
Column2: Long (FK value by referencing another table)
Column3: Long (FK value by referencing another table)

Batch size: oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10....1000)

Problem:

The query we have created for the migration from the denormalized to the normalized table is causing a significant delay. We have checked the actual execution plan and found that the estimated row count is much higher than the actual row count. We have already updated the statistics of the tables involved in the query and rebuilt the indexes of the column used in the query.

Question:

Is there any way to increase the performance of the migration in this scenario?

Additional information:

Query Plan: Link

英文:

Old Table Data Model:

Column A (PK Column): Long
Column1: String
Column2: String
Column3: String

New table data model:

Column A (PK Column): Long
Column1: Long (FK value by referencing another table)
Column2: Long (FK value by referencing another table)
Column3: Long (FK value by referencing another table)

insert into newTable (pkcolumn, column1, column2, column3)
select 
    pkColumn,
    anothertable1.id,
    anothertable2.id,
    anothertable3.id
left join 
    anothertable as anothertable1 on anothertable1.field_id = 1 
                                  and anothertable1.value = oldtable.column1 
                                  and oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10)
left join 
    anothertable as anothertable2 on anothertable2.field_id = 2 
                                  and anothertable2.value = oldtable.column2 
                                  and oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10)
left join 
    anothertable as anothertable3 on anothertable3.field_id = 3 
                                  and anothertable3.value = oldtable.column3 
                                  and oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10)

Batch size: oldtable.pkColumn in (1,2,3,4,5,6,7,8,9,10....1000)

Problem

Question

Is there any way to increase the performance of the migration in this scenario?

Additional information:

Query Plan: Link

答案1

得分: 0

基于您的回答，非规范化的表格每个有250列和50万行，我认为任何索引设计并尝试在单个SQL语句中执行都会很困难。所以您的选择是：

以编程批处理方式在SQL中运行它 - 如果数据是静态的，并且有适当的索引和统计信息，那么这可能是可行的。
这是我的选择 - 使用ETL工具来执行此操作 - 或者甚至使用Spark/Databricks，如果在云中进行，则可以使用基于云的工具。

英文:

Based on your answer that the denormalized tables have 250 columns and 0.5 million rows each, in my opinion any index design and trying to do it in a single SQL statement will be difficult. So your options are

Run it in programmatic batch way in SQL - with proper index and statistics if it is managable and source data is static
This is my choice - use an ETL tool - to do this - or even use spark/databricks or if this is in cloud any cloud based tool

答案2

得分: 0

首先，迁移数据后，可以创建外键（FK）。这将减缓速度。

其次，可以将JOIN重写为：

ON anothertable1.field_id = 1 
   and anothertable1.value = oldtable.column1 
   and oldtable.pkColumn BETWEEN 1 AND 10

但我没有看到你的表进行任何规范化。

这对我来说看起来像这个帖子：

Best index for table with dynamic predicate
我已经回答了。

英文:

First thing, you can create FK after having migrates the data. This will reduce the speed.

Second, you can rewrite your JOINs as :

ON anothertable1.field_id = 1 
   and anothertable1.value = oldtable.column1 
   and oldtable.pkColumn BETWEEN 1 AND 10

But I did not see any normalization of your table.

This seems to me like this post :

Best index for table with dynamic predicate
Which I already answer

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

SQL Server从非规范化表迁移到规范化表导致性能问题。

问题

答案1

答案2

生成两个日期列之间的每个日期的行。

Oracle分层查询返回重复而不是传递的父/子对。

从不同的表中选择包含单词列表的记录。

DELETE语句中的ANY子句不按预期工作。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论