如何加速使用DISTINCT的SQL查询

huangapple go评论104阅读模式
英文:

How to speed up a SQL query which is using DISTINCT

问题

根据我阅读的内容,似乎最佳解决方案是创建一个索引,但我不确定应该为哪些列创建索引。这是我第一次使用SQL索引。

如果我从这个查询中删除DISTINCT调用,我在1秒内就能获得超过1000个结果。然而,加上DISTINCT调用后,查询会在10秒内返回结果(显然是没有重复项的)。

如果有人有其他的解决方案,我会倾听。

这是查询(第二个SELECT是调用DISTINCT函数的地方):

  1. SELECT
  2. Sku,
  3. Name,
  4. ccp.Polygon,
  5. MarketAvailability,
  6. Coverage,
  7. Range
  8. FROM
  9. (SELECT DISTINCT
  10. dbo.CatalogEntry.CatalogEntryId as Id,
  11. dbo.CatalogEntry.Code as Sku,
  12. CoverageNode.Name as Coverage,
  13. RangeNode.Name as [Range],
  14. (SELECT CatalogContentProperty.LongString
  15. FROM CatalogContentProperty
  16. WHERE MetaFieldName = 'ItemChartName'
  17. AND (CatalogContentProperty.LongString IS NOT NULL)
  18. AND (CatalogContentProperty.ObjectId = dbo.CatalogEntry.CatalogEntryId)) AS [Name],
  19. (SELECT CatalogContentProperty.LongString
  20. FROM CatalogContentProperty
  21. WHERE MetaFieldName = 'MarketAvailabilityDetailsCollection'
  22. AND (CatalogContentProperty.LongString IS NOT NULL)
  23. AND (CatalogContentProperty.ObjectId = dbo.CatalogEntry.CatalogEntryId)) AS MarketAvailability
  24. FROM
  25. dbo.CatalogEntry
  26. INNER JOIN
  27. dbo.NodeEntryRelation ON dbo.CatalogEntry.CatalogEntryId = dbo.NodeEntryRelation.CatalogEntryId
  28. INNER JOIN
  29. dbo.CatalogNode AS CoverageNode ON dbo.NodeEntryRelation.CatalogNodeId = CoverageNode.CatalogNodeId
  30. INNER JOIN
  31. dbo.CatalogNode AS RangeNode ON CoverageNode.ParentNodeId = RangeNode.CatalogNodeId
  32. INNER JOIN
  33. dbo.CatalogContentProperty ON dbo.CatalogEntry.CatalogEntryId = dbo.CatalogContentProperty.ObjectId
  34. INNER JOIN
  35. dbo.CatalogNode AS ModelNode ON RangeNode.ParentNodeId = ModelNode.CatalogNodeId
  36. INNER JOIN
  37. dbo.CatalogNode AS BrandNode ON ModelNode.ParentNodeId = BrandNode.CatalogNodeId
  38. WHERE
  39. (dbo.CatalogEntry.ClassTypeId = N'Variation') AND
  40. (dbo.CatalogContentProperty.MetaFieldName = N'ItemIsChart') AND
  41. RangeNode.Name != 'C-MAP' AND
  42. (BrandNode.Name = '' OR '' = '' OR '' IS NULL) AND
  43. (ModelNode.Name = '' OR '' = '' OR '' IS NULL) AND
  44. (CoverageNode.Name = '' OR '' = '' OR '' IS NULL) AND
  45. (RangeNode.Name = '' OR '' = '' OR '' IS NULL)
  46. ) AS CmapResults
  47. INNER JOIN
  48. (SELECT
  49. GEOMETRY::STGeomFromText(CatalogContentProperty.LongString,4326) AS PolygonGeometry,
  50. CatalogContentProperty.LongString AS Polygon,
  51. CatalogContentProperty.ObjectId
  52. FROM
  53. CatalogContentProperty
  54. WHERE
  55. MetaFieldName = 'ItemChartCoordinates' AND
  56. (CatalogContentProperty.LongString IS NOT NULL)) ccp ON ccp.ObjectId = CmapResults.Id
  57. WHERE
  58. ((GEOGRAPHY::STGeomFromText(PolygonGeometry.MakeValid().STUnion(PolygonGeometry.MakeValid().STStartPoint()).STAsText(), 4326).STDistance(GEOGRAPHY::STGeomFromText('POINT(50.9835929 -1.4205852)', 4326)) / 1609.344) <= 100 OR 'POINT(50.9835929 -1.4205852)' IS NULL)
  59. AND MarketAvailability IS NOT NULL
  60. ORDER BY
  61. GEOGRAPHY::STGeomFromText(PolygonGeometry.MakeValid().STUnion(PolygonGeometry.MakeValid().STStartPoint()).STAsText(), 4326).STArea() DESC;

我正在使用SQL Server Management Studio 2012。目标是让带有DISTINCT调用的查询在与不带DISTINCT调用的查询相同的时间内返回结果。

英文:

From what I have read about this it seems like the best solution is to create an INDEX but I am not sure which columns I should be creating indexes for. This is my first time working with SQL indexes.

If I remove the DISTINCT call from this query I get over 1000 results in just over a second. However with the DISTINCT call it returns the results in 10 seconds (obviously without the duplicates).

If anyone has any alternative solutions I am all ears.

This is the query (the second SELECT is where the DISTINCT function is called):

  1. SELECT
  2. Sku,
  3. Name,
  4. ccp.Polygon,
  5. MarketAvailability,
  6. Coverage,
  7. Range
  8. FROM
  9. (SELECT DISTINCT
  10. dbo.CatalogEntry.CatalogEntryId as Id,
  11. dbo.CatalogEntry.Code as Sku,
  12. CoverageNode.Name as Coverage,
  13. RangeNode.Name as [Range],
  14. (SELECT CatalogContentProperty.LongString
  15. FROM CatalogContentProperty
  16. WHERE MetaFieldName = &#39;ItemChartName&#39;
  17. AND (CatalogContentProperty.LongString IS NOT NULL)
  18. AND (CatalogContentProperty.ObjectId = dbo.CatalogEntry.CatalogEntryId)) AS [Name],
  19. (SELECT CatalogContentProperty.LongString
  20. FROM CatalogContentProperty
  21. WHERE MetaFieldName = &#39;MarketAvailabilityDetailsCollection&#39;
  22. AND (CatalogContentProperty.LongString IS NOT NULL)
  23. AND (CatalogContentProperty.ObjectId = dbo.CatalogEntry.CatalogEntryId)) AS MarketAvailability
  24. FROM
  25. dbo.CatalogEntry
  26. INNER JOIN
  27. dbo.NodeEntryRelation ON dbo.CatalogEntry.CatalogEntryId = dbo.NodeEntryRelation.CatalogEntryId
  28. INNER JOIN
  29. dbo.CatalogNode AS CoverageNode ON dbo.NodeEntryRelation.CatalogNodeId = CoverageNode.CatalogNodeId
  30. INNER JOIN
  31. dbo.CatalogNode AS RangeNode ON CoverageNode.ParentNodeId = RangeNode.CatalogNodeId
  32. INNER JOIN
  33. dbo.CatalogContentProperty ON dbo.CatalogEntry.CatalogEntryId = dbo.CatalogContentProperty.ObjectId
  34. INNER JOIN
  35. dbo.CatalogNode AS ModelNode ON RangeNode.ParentNodeId = ModelNode.CatalogNodeId
  36. INNER JOIN
  37. dbo.CatalogNode AS BrandNode ON ModelNode.ParentNodeId = BrandNode.CatalogNodeId
  38. WHERE
  39. (dbo.CatalogEntry.ClassTypeId = N&#39;Variation&#39;) AND
  40. (dbo.CatalogContentProperty.MetaFieldName = N&#39;ItemIsChart&#39;) AND
  41. RangeNode.Name != &#39;C-MAP&#39; AND
  42. (BrandNode.Name = &#39;&#39; OR &#39;&#39; = &#39;&#39; OR &#39;&#39; IS NULL) AND
  43. (ModelNode.Name = &#39;&#39; OR &#39;&#39; = &#39;&#39; OR &#39;&#39; IS NULL) AND
  44. (CoverageNode.Name = &#39;&#39; OR &#39;&#39; = &#39;&#39; OR &#39;&#39; IS NULL) AND
  45. (RangeNode.Name = &#39;&#39; OR &#39;&#39; = &#39;&#39; OR &#39;&#39; IS NULL)
  46. ) AS CmapResults
  47. INNER JOIN
  48. (SELECT
  49. GEOMETRY::STGeomFromText(CatalogContentProperty.LongString,4326) AS PolygonGeometry,
  50. CatalogContentProperty.LongString AS Polygon,
  51. CatalogContentProperty.ObjectId
  52. FROM
  53. CatalogContentProperty
  54. WHERE
  55. MetaFieldName = &#39;ItemChartCoordinates&#39; AND
  56. (CatalogContentProperty.LongString IS NOT NULL)) ccp ON ccp.ObjectId = CmapResults.Id
  57. WHERE
  58. ((GEOGRAPHY::STGeomFromText(PolygonGeometry.MakeValid().STUnion(PolygonGeometry.MakeValid().STStartPoint()).STAsText(), 4326).STDistance(GEOGRAPHY::STGeomFromText(&#39;POINT(50.9835929 -1.4205852)&#39;, 4326)) / 1609.344) &lt;= 100 OR &#39;POINT(50.9835929 -1.4205852)&#39; IS NULL)
  59. AND MarketAvailability IS NOT NULL
  60. ORDER BY
  61. GEOGRAPHY::STGeomFromText(PolygonGeometry.MakeValid().STUnion(PolygonGeometry.MakeValid().STStartPoint()).STAsText(), 4326).STArea() DESC;

I am using SQL Server Management Studio 2012. The aim is to get the query with the DISTINCT call to return in the same amount of time as the query would without the DISTINCT call.

答案1

得分: 0

您的查询看起来有点复杂。这个查询会产生相同的输出吗?

  1. SELECT DISTINCT ce.Code as Sku
  2. , max(case when ccp.MetaFieldName = 'ItemChartName' then ccp.LongString end) as 'Name'
  3. , max(case when ccp.MetaFieldName = 'MarketAvailabilityDetailsCollection' then ccp.LongString end) as 'MarketAvailability'
  4. , max(case when ccp.MetaFieldName = 'ItemChartCoordinates' then ccp.LongString end) as 'Polygon'
  5. , CoverageNode.Name as Coverage
  6. , RangeNode.Name as 'Range'
  7. FROM dbo.CatalogEntry ce
  8. INNER JOIN dbo.NodeEntryRelation ner ON ce.CatalogEntryId = ner.CatalogEntryId
  9. INNER JOIN dbo.CatalogNode CoverageNode ON dbo.NodeEntryRelation.CatalogNodeId = CoverageNode.CatalogNodeId
  10. INNER JOIN dbo.CatalogNode RangeNode ON CoverageNode.ParentNodeId = RangeNode.CatalogNodeId
  11. INNER JOIN dbo.CatalogContentProperty ccp ON ce.CatalogEntryId = ccp.ObjectId
  12. INNER JOIN dbo.CatalogNode ModelNode ON RangeNode.ParentNodeId = ModelNode.CatalogNodeId
  13. INNER JOIN dbo.CatalogNode BrandNode ON ModelNode.ParentNodeId = BrandNode.CatalogNodeId
  14. WHERE ce.ClassTypeId = N'Variation'
  15. and ccp.MetaFieldName = N'ItemIsChart'
  16. and RangeNode.Name != 'C-MAP'
  17. --and (BrandNode.Name = '' OR '' = '' OR '' IS NULL) -- always true
  18. --and (ModelNode.Name = '' OR '' = '' OR '' IS NULL) -- always true
  19. --and (CoverageNode.Name = '' OR '' = '' OR '' IS NULL) -- always true
  20. --and (RangeNode.Name = '' OR '' = '' OR '' IS NULL) -- always true
  21. and (GEOGRAPHY::STGeomFromText(GEOMETRY::STGeomFromText(ccp.LongString,4326).MakeValid().STUnion(GEOMETRY::STGeomFromText(ccp.LongString,4326).MakeValid().STStartPoint()).STAsText(), 4326).STDistance(GEOGRAPHY::STGeomFromText('POINT(50.9835929 -1.4205852)', 4326)) / 1609.344) <= 100 -- OR 'POINT(50.9835929 -1.4205852)' IS NULL -- redundant
  22. )
  23. and max(case when ccp.MetaFieldName = 'MarketAvailabilityDetailsCollection' then ccp.LongString end)
  24. ORDER BY GEOGRAPHY::STGeomFromText(GEOMETRY::STGeomFromText(ccp.LongString,4326).MakeValid().STUnion(GEOMETRY::STGeomFromText(ccp.LongString,4326).MakeValid().STStartPoint()).STAsText(), 4326).STArea() DESC;
英文:

Your query looks a bit complicated. Will this one produce the same output?

  1. SELECT DISTINCT ce.Code as Sku
  2. , max(case when ccp.MetaFieldName = &#39;ItemChartName&#39; then ccp.LongString end) as &#39;Name&#39;
  3. , max(case when ccp.MetaFieldName = &#39;MarketAvailabilityDetailsCollection&#39; then ccp.LongString end) as &#39;MarketAvailability&#39;
  4. , max(case when ccp.MetaFieldName = &#39;ItemChartCoordinates&#39; then ccp.LongString end) as &#39;Polygon&#39;
  5. , CoverageNode.Name as Coverage
  6. , RangeNode.Name as &#39;Range&#39;
  7. FROM dbo.CatalogEntry ce
  8. INNER JOIN dbo.NodeEntryRelation ner ON ce.CatalogEntryId = ner.CatalogEntryId
  9. INNER JOIN dbo.CatalogNode CoverageNode ON dbo.NodeEntryRelation.CatalogNodeId = CoverageNode.CatalogNodeId
  10. INNER JOIN dbo.CatalogNode RangeNode ON CoverageNode.ParentNodeId = RangeNode.CatalogNodeId
  11. INNER JOIN dbo.CatalogContentProperty ccp ON ce.CatalogEntryId = ccp.ObjectId
  12. INNER JOIN dbo.CatalogNode ModelNode ON RangeNode.ParentNodeId = ModelNode.CatalogNodeId
  13. INNER JOIN dbo.CatalogNode BrandNode ON ModelNode.ParentNodeId = BrandNode.CatalogNodeId
  14. WHERE ce.ClassTypeId = N&#39;Variation&#39;
  15. and ccp.MetaFieldName = N&#39;ItemIsChart&#39;
  16. and RangeNode.Name != &#39;C-MAP&#39;
  17. --and (BrandNode.Name = &#39;&#39; OR &#39;&#39; = &#39;&#39; OR &#39;&#39; IS NULL) -- always true
  18. --and (ModelNode.Name = &#39;&#39; OR &#39;&#39; = &#39;&#39; OR &#39;&#39; IS NULL) -- always true
  19. --and (CoverageNode.Name = &#39;&#39; OR &#39;&#39; = &#39;&#39; OR &#39;&#39; IS NULL) -- always true
  20. --and (RangeNode.Name = &#39;&#39; OR &#39;&#39; = &#39;&#39; OR &#39;&#39; IS NULL) -- always true
  21. and (GEOGRAPHY::STGeomFromText(GEOMETRY::STGeomFromText(ccp.LongString,4326).MakeValid().STUnion(GEOMETRY::STGeomFromText(ccp.LongString,4326).MakeValid().STStartPoint()).STAsText(), 4326).STDistance(GEOGRAPHY::STGeomFromText(&#39;POINT(50.9835929 -1.4205852)&#39;, 4326)) / 1609.344) &lt;= 100 -- OR &#39;POINT(50.9835929 -1.4205852)&#39; IS NULL -- redundant
  22. )
  23. and max(case when ccp.MetaFieldName = &#39;MarketAvailabilityDetailsCollection&#39; then ccp.LongString end)
  24. ORDER BY GEOGRAPHY::STGeomFromText(GEOMETRY::STGeomFromText(ccp.LongString,4326).MakeValid().STUnion(GEOMETRY::STGeomFromText(ccp.LongString,4326).MakeValid().STStartPoint()).STAsText(), 4326).STArea() DESC;

答案2

得分: 0

答案是我的情况是去掉 DISTINCT,并使用 LINQ 进行重复项移除!加载时间从大约 10-11 秒缩短到大约 4-5 秒。

英文:

So the answer is my case was to remove the DISTINCT and to do the duplicate removing using LINQ! Load times went from 10-11ish seconds to 4-5ish seconds

huangapple
  • 本文由 发表于 2020年1月4日 00:34:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/59582087.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定