Matching array elements from two tables and returning results in a new field with distinct values sharing at least one value using clickhouse

huangapple go评论91阅读模式
英文:

Matching array elements from two tables and returning results in a new field with distinct values sharing at least one value using clickhouse

问题

你好,我有两个表,它们共享两个字段,'path'(单个字符串字段)和'arrKW'(字符串数组)。

示例表1:

  1. create table test (path String, arrKW Array(String)) Engine=Memory as
  2. select * from values (('folder/puntonet',['kw1','kw2']),
  3. ('folder/puntonet-2.0',['kw2','kw3']),
  4. ('folder/puntonet-4',['kw2','kw4']),
  5. ('folder/puntonet-5',['kw5','kw4']));

表1内容如下:

  1. | path | arrKW |
  2. +---------------------+---------------+
  3. | folder/puntonet | ['kw1','kw2'] |
  4. | folder/puntonet-2.0 | ['kw2','kw3'] |
  5. | folder/puntonet-4 | ['kw2','kw4'] |
  6. | folder/puntonet-5 | ['kw5','kw4'] |

示例表2:

  1. create table test2 (path String, arrKW Array(String)) Engine=Memory as
  2. select * from values (('folder/otherpuntonet',['kw1','kw2']),
  3. ('folder/otherpuntonet-2.0',['kw2','kw77']),
  4. ('folder/otherpuntonet-4',['kw2','kw77']),
  5. ('folder/puntonet-5',['kw5','kw4']));

表2内容如下:

  1. | path | arrKW |
  2. +--------------------------+-----------------+
  3. | folder/otherpuntonet | ['kw1','kw2'] |
  4. | folder/otherpuntonet-2.0 | ['kw2','kw77'] |
  5. | folder/otherpuntonet-4 | ['kw2','kw77'] |
  6. | folder/puntonet-5 | ['kw5','kw4'] |

期望的结果:

  1. | path | arrKW | arrResult (来自表2) |
  2. +----------------------+---------------+---------------------------------------------------------------------------------------------+
  3. | folder/otherpuntonet | ['kw1','kw2'] | ['folder/otherpuntonet-2.0','folder/otherpuntonet-4'] |
  4. | folder/puntonet-2.0 | ['kw2','kw3'] | ['folder/otherpuntonet','folder/otherpuntonet-2.0','folder/otherpuntonet-4'] |
  5. | folder/puntonet-4 | ['kw2','kw4'] | ['folder/otherpuntonet','folder/otherpuntonet-2.0','folder/otherpuntonet-4','folder/puntonet-5'] |
  6. | folder/puntonet-5 | ['kw5','kw4'] | ['folder/otherpuntonet-4'] |
英文:

Hello I've two tables that share two fields, 'path' (single String field) and 'arrKW' (Array(String))

Exaple Table1

  1. create table test (path String, arrKW Array(String))Engine=Memory as
  2. select * from values (('folder/puntonet',['kw1','kw2']),
  3. ('folder/puntonet-2.0',['kw2','kw3']),
  4. ('folder/puntonet-4',['kw2','kw4']),
  5. ('folder/puntonet-5',['kw5','kw4']));
  6. | path | arrKW |
  7. +---------------------+---------------+
  8. | folder/puntonet | ['kw1','kw2'] |
  9. | folder/puntonet-2.0 | ['kw2','kw3'] |
  10. | folder/puntonet-4 | ['kw2','kw4'] |
  11. | folder/puntonet-5 | ['kw5','kw4'] |

Exaple Table2

  1. create table test2 (path String, arrKW Array(String))Engine=Memory as
  2. select * from values (('folder/otherpuntonet',['kw1','kw2']),
  3. ('folder/otherpuntonet-2.0',['kw2','kw77']),
  4. ('folder/otherpuntonet-4',['kw2','kw77']),
  5. ('folder/puntonet-5',['kw5','kw4']))
  6. | path | arrKW |
  7. +--------------------------+----------------+
  8. | folder/otherpuntonet | ['kw1','kw2'] |
  9. | folder/otherpuntonet-2.0 | ['kw2','kw77'] |
  10. | folder/otherpuntonet-4 | ['kw2','kw77'] |
  11. | folder/puntonet-5 | ['kw5','kw4'] |

Result desired

  1. | path | arrKW | arrResult (from table2) |
  2. +----------------------+---------------+--------------------------------------------------------------------------------------------------+
  3. | folder/otherpuntonet | ['kw1','kw2'] | ['folder/otherpuntonet-2.0','folder/otherpuntonet-4'] |
  4. | folder/puntonet-2.0 | ['kw2','kw3'] | ['folder/otherpuntonet','folder/otherpuntonet-2.0','folder/otherpuntonet-4'] |
  5. | folder/puntonet-4 | ['kw2','kw4'] | ['folder/otherpuntonet','folder/otherpuntonet-2.0','folder/otherpuntonet-4','folder/puntonet-5'] |
  6. | folder/puntonet-5 | ['kw5','kw4'] | ['folder/otherpuntonet-4'] |

答案1

得分: 1

以下是翻译好的内容:

原始代码:

  1. task is very unclear
  2. create table test (path String, arrKW Array(String))Engine=Memory as
  3. select * from values (('folder/puntonet',['kw1','kw2']),
  4. ('folder/puntonet-2.0',['kw2','kw3']),
  5. ('folder/puntonet-4',['kw2','kw4']),
  6. ('folder/puntonet-5',['kw5','kw4']));
  7. create table test2 (path String, arrKW Array(String))Engine=Memory as
  8. select * from values (('folder/otherpuntonet',['kw1','kw2']),
  9. ('folder/otherpuntonet-2.0',['kw2','kw77']),
  10. ('folder/otherpuntonet-4',['kw2','kw77']),
  11. ('folder/otherpuntonet-5',['kw5','kw4']))
  12. SELECT
  13. (arrayJoin(a) AS t).2 AS path,
  14. t.1 AS KW,
  15. t.3 as tab,
  16. arraySort(groupUniqArrayArray(patha)) AS result
  17. FROM
  18. (
  19. SELECT
  20. groupArray(pathx) AS patha,
  21. groupArray(t) AS a,
  22. arrayJoin(arrKW) AS KW
  23. FROM (select 'test' tab, null pathx, arrKW, (arrKW, path, tab) t from test union all
  24. select 'test2' tab, path pathx, arrKW, (arrKW, path, tab) t from test2)
  25. GROUP BY KW
  26. )
  27. GROUP BY t
  28. having tab = 'test'
  29. ORDER BY path ASC

改进后的代码:

  1. SELECT
  2. (arrayJoin(a) AS t).2 AS path,
  3. t.1 AS KW,
  4. arraySort(groupUniqArrayArray(patha)) AS result
  5. FROM
  6. (
  7. SELECT
  8. groupArray(pathx) AS patha,
  9. groupArray(t) AS a,
  10. arrayJoin(arrKW) AS KW
  11. FROM (select null pathx, arrKW, (arrKW, path) t from test union all
  12. select path pathx, arrKW, ([], null) t from test2)
  13. GROUP BY KW
  14. )
  15. WHERE path is not null
  16. GROUP BY t
  17. ORDER BY path ASC

希望这对你有帮助。

英文:

task is very unclear

  1. create table test (path String, arrKW Array(String))Engine=Memory as
  2. select * from values (('folder/puntonet',['kw1','kw2']),
  3. ('folder/puntonet-2.0',['kw2','kw3']),
  4. ('folder/puntonet-4',['kw2','kw4']),
  5. ('folder/puntonet-5',['kw5','kw4']));
  6. create table test2 (path String, arrKW Array(String))Engine=Memory as
  7. select * from values (('folder/otherpuntonet',['kw1','kw2']),
  8. ('folder/otherpuntonet-2.0',['kw2','kw77']),
  9. ('folder/otherpuntonet-4',['kw2','kw77']),
  10. ('folder/otherpuntonet-5',['kw5','kw4']))
  11. SELECT
  12. (arrayJoin(a) AS t).2 AS path,
  13. t.1 AS KW,
  14. t.3 as tab,
  15. arraySort(groupUniqArrayArray(patha)) AS result
  16. FROM
  17. (
  18. SELECT
  19. groupArray(pathx) AS patha,
  20. groupArray(t) AS a,
  21. arrayJoin(arrKW) AS KW
  22. FROM (select 'test' tab, null pathx, arrKW, (arrKW, path, tab) t from test union all
  23. select 'test2' tab, path pathx, arrKW, (arrKW, path, tab) t from test2)
  24. GROUP BY KW
  25. )
  26. GROUP BY t
  27. having tab = 'test'
  28. ORDER BY path ASC
  29. ┌─path────────────────┬─KW────────────┬─tab──┬─result────────────────────────────────────────────────────────────────────────────────────────────────┐
  30. folder/puntonet ['kw1','kw2'] test ['folder/otherpuntonet','folder/otherpuntonet-2.0','folder/otherpuntonet-4']
  31. folder/puntonet-2.0 ['kw2','kw3'] test ['folder/otherpuntonet','folder/otherpuntonet-2.0','folder/otherpuntonet-4']
  32. folder/puntonet-4 ['kw2','kw4'] test ['folder/otherpuntonet','folder/otherpuntonet-2.0','folder/otherpuntonet-4','folder/otherpuntonet-5']
  33. folder/puntonet-5 ['kw5','kw4'] test ['folder/otherpuntonet-5']
  34. └─────────────────────┴───────────────┴──────┴───────────────────────────────────────────────────────────────────────────────────────────────────────┘

probably it can be improved

  1. SELECT
  2. (arrayJoin(a) AS t).2 AS path,
  3. t.1 AS KW,
  4. arraySort(groupUniqArrayArray(patha)) AS result
  5. FROM
  6. (
  7. SELECT
  8. groupArray(pathx) AS patha,
  9. groupArray(t) AS a,
  10. arrayJoin(arrKW) AS KW
  11. FROM (select null pathx, arrKW, (arrKW, path) t from test union all
  12. select path pathx, arrKW, ([], null) t from test2)
  13. GROUP BY KW
  14. )
  15. WHERE path is not null
  16. GROUP BY t
  17. ORDER BY path ASC
  18. ┌─path────────────────┬─KW────────────┬─result────────────────────────────────────────────────────────────────────────────────────────────────┐
  19. folder/puntonet ['kw1','kw2'] ['folder/otherpuntonet','folder/otherpuntonet-2.0','folder/otherpuntonet-4']
  20. folder/puntonet-2.0 ['kw2','kw3'] ['folder/otherpuntonet','folder/otherpuntonet-2.0','folder/otherpuntonet-4']
  21. folder/puntonet-4 ['kw2','kw4'] ['folder/otherpuntonet','folder/otherpuntonet-2.0','folder/otherpuntonet-4','folder/otherpuntonet-5']
  22. folder/puntonet-5 ['kw5','kw4'] ['folder/otherpuntonet-5']
  23. └─────────────────────┴───────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────┘

huangapple
  • 本文由 发表于 2023年2月16日 04:37:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75465207.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定