SQL查询以连接具有重复行的表并获取唯一行

huangapple go评论135阅读模式
英文:

SQL query to join tables with duplicate rows and getting unique rows

问题

以下是翻译好的部分:

  1. SELECT
  2. item.id, ich.id, ich.existing_stores, mid.missing_stores
  3. FROM
  4. `hd-merch-prod.merch_item_cache_validation.item` item
  5. JOIN
  6. `hd-merch-prod.merch_item_cache.item_change_history` ich
  7. ON item.date = "2023-08-03"
  8. AND DATE(ich.createdTime) = item.date
  9. AND ich.id = item.id
  10. JOIN
  11. `hd-merch-prod.merch_item_cache.mic_item_discrepancy` mid
  12. ON item.id = mid.id
  13. GROUP BY
  14. item.id, ich.id, ich.existing_stores, mid.missing_stores;

希望这能帮助您。

英文:

I got the following three tables in BigQuery along with the data below.

I'd like to write a SQL query using id and current date columns and get the following row as the output.

Expected result set:

  1. id existing_stores missing_stores
  2. 1003812607 "3640,0130,0131,2306,3638,0127,2789,2305" "3102,2681,2686,2670,2682,3101,2673,2669,3103,2668"

These are the tables:

item table:

  1. id date
  2. ------------------------
  3. 1003812607 2023-08-03
  4. 1003812607 2023-08-01
  5. 1003812607 2023-07-23
  6. 1003812607 2023-06-30

item_change_history:

  1. createdTime docType id existing_stores
  2. ---------------------------------------------------------------------------------------------
  3. 2023-08-03 11:01:10.139617 UTC Item 1003812607 "3640,0130,0131,2306,3638,0127,2789,2305"
  4. 2023-07-01 09:01:10.139617 UTC Item 1003812607 "3640,0130,0131,2306,3638,0127,2789,2301"

mic_item_discrepancy:

  1. ID MISSING_STORE
  2. -------------------------
  3. 1003812607 3102
  4. 1003812607 2681
  5. 1003812607 2686
  6. 1003812607 2670
  7. 1003812607 2682
  8. 1003812607 3101
  9. 1003812607 2673
  10. 1003812607 2669
  11. 1003812607 3103
  12. 1003812607 2668

I tried to come up with this query and it is not working as expected or giving me the wrong data given that duplicate id rows in
mic_item_discrepancy table.

  1. SELECT
  2. item.id, ich.id, ich.existing_stores, mid.missing_stores
  3. FROM
  4. `hd-merch-prod.merch_item_cache_validation.item` item
  5. JOIN
  6. `hd-merch-prod.merch_item_cache.item_change_history` ich
  7. ON item.date = "2023-08-03"
  8. AND DATE(ich.createdTime) = item.date
  9. AND ich.id = item.id
  10. JOIN
  11. `hd-merch-prod.merch_item_cache.mic_item_discrepancy` mid
  12. ON item.id = mid.id
  13. GROUP BY
  14. item.id, ich.id, ich.existing_stores, mid.missing_stores;

答案1

得分: 2

尝试在item_change_historymic_item_discrepancy表上都使用LEFT JOIN,以确保结果中包括所有来自项目表的行,并在STRING_AGG函数中添加DISTINCT

  1. SELECT
  2. item.id, ich.id, ich.existing_stores, COALESCE(mid.missing_stores, '') AS missing_stores
  3. FROM
  4. `hd-merch-prod.merch_item_cache_validation.item` item
  5. LEFT JOIN
  6. (
  7. SELECT id, existing_stores, MAX(createdTime) as latest_createdTime
  8. FROM
  9. `hd-merch-prod.merch_item_cache.item_change_history`
  10. WHERE
  11. DATE(createdTime) = '2023-08-03'
  12. GROUP BY
  13. id, existing_stores
  14. ) ich ON item.id = ich.id
  15. LEFT JOIN
  16. (
  17. SELECT ID, STRING_AGG(DISTINCT MISSING_STORE, ',') AS missing_stores
  18. FROM
  19. `hd-merch-prod.merch_item_cache.mic_item_discrepancy`
  20. GROUP BY
  21. ID
  22. ) mid ON item.id = mid.ID;
英文:

Try to use LEFT JOIN for both the item_change_history and mic_item_discrepancy tables to ensure that all rows from the item table are included in the result, and add a DISTINCT in the STRING_AGG function:

  1. SELECT
  2. item.id, ich.id, ich.existing_stores, COALESCE(mid.missing_stores, '') AS missing_stores
  3. FROM
  4. `hd-merch-prod.merch_item_cache_validation.item` item
  5. LEFT JOIN
  6. (
  7. SELECT id, existing_stores, MAX(createdTime) as latest_createdTime
  8. FROM
  9. `hd-merch-prod.merch_item_cache.item_change_history`
  10. WHERE
  11. DATE(createdTime) = '2023-08-03'
  12. GROUP BY
  13. id, existing_stores
  14. ) ich ON item.id = ich.id
  15. LEFT JOIN
  16. (
  17. SELECT ID, STRING_AGG(DISTINCT MISSING_STORE, ',') AS missing_stores
  18. FROM
  19. `hd-merch-prod.merch_item_cache.mic_item_discrepancy`
  20. GROUP BY
  21. ID
  22. ) mid ON item.id = mid.ID;

huangapple
  • 本文由 发表于 2023年8月4日 06:22:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76831920.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定