英文:
How to eliminate multiple server calls | MS SQL Server
问题
有一个需要修改的存储过程,需要取消对另一个服务器的调用。
最简单和可行的方法是什么,以便最终存储过程的执行时间更快,而且更倾向于不涉及对应用程序进行大规模更改的解决方案?
例如:
select *
from dbo.table1 a
inner join server2.dbo.table2 b on a.id = b.id
英文:
There is a stored procedure that needs to be modified to eliminate a call to another server.
What is the easiest and feasible way to do this so that the final SP's execution time is faster and also preference to solutions which do not involve much change to the application?
Eg:
select *
from dbo.table1 a
inner join server2.dbo.table2 b on a.id = b.id
答案1
得分: 1
跨服务器的JOIN操作可能会出现问题,因为优化器不总是选择最有效的解决方案,甚至可能导致整个远程表格被拖动到您的网络上,以查询单行数据。
如果可以证明的话,复制是最好的选择。这意味着您需要在要复制的表格上有一个主键,这似乎是一个合理的限制(哈!),但在使用第三方系统时可能会成为问题。
如果远程表格很小,那么最好将其复制到本地临时表格中,例如 SELECT * INTO #temp FROM server2.<database>.dbo.table2;
。然后,您可以将查询更改为类似这样的形式:select * from dbo.table1 a inner join #temp b on a.id = b.id;
。当会话结束时,临时表格将被标记为垃圾回收,因此无需自行清理。
如果表格较大,那么您可能希望执行上述操作,但还要向临时表格添加索引,例如 CREATE INDEX ix$temp ON #temp (id);
。请注意,如果使用命名索引,如果同时运行相同的过程两次,就会出现问题,因为索引名称不会是唯一的。如果执行总是按顺序进行,则不会出现问题。
如果您只有少量要包括的id,则OPENQUERY
可能是一种选择,例如 SELECT * FROM OPENQUERY('server2', 'SELECT * FROM table2 WHERE id IN (''1'', ''2'')');
。这里的优势是您现在在远程服务器上运行查询,因此更有可能使用更有效的查询计划。
底线是,如果您希望能够JOIN远程和本地表格,那么您总会有一定程度的不确定性;即使查询在一天内运行良好,它可能在第二天突然决定运行得慢得多。小事情,比如向远程表格添加一行数据,可能会完全改变查询的执行方式。
英文:
Cross server JOINs can be problematic as the optimiser doesn't always pick the most effective solution, which may even result in the entire remote table being dragged over your network to be queried for a single row.
Replication is by far the best option, if you can justify it. This will mean you need to have a primary key on the table you want to replicate, which seems a reasonable constraint (ha!), but might become an issue with a third-party system.
if the remote table is small then it might be better to take a temporary local copy, e.g. SELECT * INTO #temp FROM server2.<database>.dbo.table2;
. Then you can change your query to something like this: select * from dbo.table1 a inner join #temp b on a.id = b.id;
. The temporary table will be marked for garbage collection when your session ends, so no need to tidy up after yourself.
If the table is larger then you might want to do the above, but also add an index to your temporary table, e.g. CREATE INDEX ix$temp ON #temp (id);
. Note that if you use a named index then you will have issues if you run the same procedure twice simultaneously, as the index name won't be unique. This isn't a problem if the execution is always in series.
If you have a small number of ids that you want to include then OPENQUERY
might be the way to go, e.g. SELECT * FROM OPENQUERY('server2', 'SELECT * FROM table2 WHERE id IN (''1'', ''2'')');
. The advantage here is that you are now running the query on the remote server, so it's more likely to use a more efficient query plan.
The bottom line is that if you expect to be able to JOIN a remote and local table then you will always have some level of uncertainty; even if the query runs well one day, it might suddenly decide to run a LOT slower the following day. Small things, like adding a single row of data to the remote table, can completely change the way the query is executed.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论