ClickHouse:将表与外部数组进行”左连接”。

huangapple go评论64阅读模式
英文:

Clickhouse: "left join" table to external array

问题

假设有一个表格:

```sql
CREATE TABLE details(id String, detail_one UInt64, detail_two UInt64) ENGINE = Memory;
INSERT INTO TABLE details(id, detail_one, detail_two) VALUES ('id1', 5, 10), ('id2', 20, 30);

我想要能够将其与外部输入值的数组进行“左连接”,以便在从数据库获取结果集后,无需执行额外的基于哈希映射的查找。这包括我尝试过的使用普通的WHERE IN和在这个输入数组上的UNION的解决方案。总体来说,这是一个不好的想法吗?您能提出更好的替代方案吗?

类似这样的代码:

SELECT *
FROM ['id1', 'id2', 'id3-missing'] AS input
LEFT JOIN details ON details.id = input;

这样得到的结果集如下:

{
    ('id2', 5, 10), 
    ('id1', 20, 30),
    ('id3-missing', NULL, NULL) # 基本上返回NULL
}

注意:

  1. 这是一个相当小的表格:少于10000行(主要是列),在这方面不会有太多变化,输入数组的大小可以类似。
  2. 它必须支持我打算在其上执行的额外聚合操作。

<details>
<summary>英文:</summary>

Suppose there is a table:

```lang-sql
CREATE TABLE details(id String, detail_one UInt64, detail_two UInt64) ENGINE = Memory;
INSERT INTO TABLE details(id, detail_one, detail_two) VALUES (&#39;id1&#39;, 5, 10), (&#39;id2&#39;, 20, 30);

I want to be able to "left join" it to an external array of input values, so that I won't have to perform additional hash map based lookups after I get the resulting set from the database. This includes solutions that I've tried with a plain WHERE IN and UNION over this input array. Is it a bad idea in general? Can you suggest a better alternative?

Something like this:

SELECT *
FROM [&#39;id1&#39;, &#39;id2&#39;, &#39;id3-missing&#39;] AS input
LEFT JOIN details ON details.id = input;

So that the resulting set is as follows:

{
    (&#39;id2&#39;, 5, 10), 
    (&#39;id1&#39;, 20, 30),
    (&#39;id3-missing&#39;, NULL, NULL) # basically return NULLs
}

Notes:

  1. It is a rather small table: less than 10000 rows (mostly columns) and won't change much in this regard, the input array can be sized similarly.
  2. It has to support extra aggregations that I'm going to perform on top of it.

答案1

得分: 1

在这种情况下,你可能会发现arrayJoin很有帮助。它允许你在SELECT语句中为子查询使用数组。所以使用你提供的示例信息,它会类似这样:

SELECT input.id, details.detail_one, details.detail_two
FROM (
  SELECT arrayJoin(['id1', 'id2', 'id3-missing']) as id
) AS input
LEFT JOIN details ON details.id = input.id;

所以就像你发布的那样,如果details.id不存在,那么它显然会返回NULLdetail_onedetail_two

你还提到:

>它必须支持我打算在其上执行的额外聚合。

但这应该不是问题。只需在连接操作之后添加它们即可。

编辑

我知道这条评论是在另一篇帖子上的,但我想我可以帮助一下。对于这个问题:

>太棒了!这两种方法都很好用,如果我想添加与这些id关联的额外输入数组值以进行聚合怎么办。我尝试将其类似于FROM中的arrayJoin字段添加,但它会生成额外的行,尽管JOIN操作是在id上执行的。

你可以像这样使用arrayJoin与元组:

SELECT input.id, input.associated_value, details.detail_one, details.detail_two
FROM (
  SELECT arrayJoin([('id1', 100), ('id2', 200), ('id3-missing', 300)]) as (id, associated_value)
) AS input
LEFT JOIN details ON details.id = input.id;
英文:

In this case, you might find arrayJoin to be helpful. It enables you to use an array in your SELECT statement for a subquery. So using the example info you provided, it would look something like this:

SELECT input.id, details.detail_one, details.detail_two
FROM (
  SELECT arrayJoin([&#39;id1&#39;, &#39;id2&#39;, &#39;id3-missing&#39;]) as id
) AS input
LEFT JOIN details ON details.id = input.id;

So just like you posted, if details.id did not exist, then it would obviously return NULL for detail_one and detail_two.

You also mention that:

> It has to support extra aggregations that I'm going to perform on top of it.

But this should be no issue. Just be sure to add them after your join operation.


Edit

I know this comment was on the other post but I figured I could help. For this question:

>Awesome! Both of these work just fine, what if I want to add an extra input array of values associated with these ids to be used in aggregations. I've tried adding it similarly as an arrayJoin field in the FROM, it generates extra rows even though JOINs are performed on ids.

You can use arrayJoin with a tuple like so:

SELECT input.id, input.associated_value, details.detail_one, details.detail_two
FROM (
  SELECT arrayJoin([(&#39;id1&#39;, 100), (&#39;id2&#39;, 200), (&#39;id3-missing&#39;, 300)]) as (id, associated_value)
) AS input
LEFT JOIN details ON details.id = input.id;

答案2

得分: 1

考虑使用 arrayJoin 函数:

SELECT *
FROM
(
    SELECT arrayJoin(['id1', 'id2', 'id3-missing']) AS id
) AS input
LEFT JOIN details ON details.id = input.id
SETTINGS join_use_nulls = 1

或者

SELECT *
FROM details
RIGHT JOIN
(
    SELECT arrayJoin(['id1', 'id2', 'id3-missing']) AS id
) AS input ON details.id = input.id
SETTINGS join_use_nulls = 1
SELECT
    *,
    input.tuple.1 AS array_id,
    input.tuple.2 AS array_value
FROM
(
    SELECT arrayJoin([('id1', 100), ('id2', 200), ('id3-missing', 300)]) AS tuple
) AS input
LEFT JOIN details ON details.id = (input.tuple.1)
英文:

Consider using arrayJoin function:

SELECT *
FROM
(
    SELECT arrayJoin([&#39;id1&#39;, &#39;id2&#39;, &#39;id3-missing&#39;]) AS id
) AS input
LEFT JOIN details ON details.id = input.id
SETTINGS join_use_nulls = 1

/*
┌─id──────────┬─details.id─┬─detail_one─┬─detail_two─┐
│ id1         │ id1        │          5 │         10 │
│ id2         │ id2        │         20 │         30 │
│ id3-missing │ ᴺᵁᴸᴸ       │       ᴺᵁᴸᴸ │       ᴺᵁᴸᴸ │
└─────────────┴────────────┴────────────┴────────────┘
*/

or

SELECT *
FROM details
RIGHT JOIN
(
    SELECT arrayJoin([&#39;id1&#39;, &#39;id2&#39;, &#39;id3-missing&#39;]) AS id
) AS input ON details.id = input.id
SETTINGS join_use_nulls = 1

/*
┌─id──┬─detail_one─┬─detail_two─┬─input.id─┐
│ id1 │          5 │         10 │ id1      │
│ id2 │         20 │         30 │ id2      │
└─────┴────────────┴────────────┴──────────┘
┌─id───┬─detail_one─┬─detail_two─┬─input.id────┐
│ ᴺᵁᴸᴸ │       ᴺᵁᴸᴸ │       ᴺᵁᴸᴸ │ id3-missing │
└──────┴────────────┴────────────┴─────────────┘
*/

SELECT
    *,
    input.tuple.1 AS array_id,
    input.tuple.2 AS array_value
FROM
(
    SELECT arrayJoin([(&#39;id1&#39;, 100), (&#39;id2&#39;, 200), (&#39;id3-missing&#39;, 300)]) AS tuple
) AS input
LEFT JOIN details ON details.id = (input.tuple.1)

/*
┌─tuple───────────────┬─id──┬─detail_one─┬─detail_two─┬─array_id────┬─array_value─┐
│ (&#39;id1&#39;,100)         │ id1 │          5 │         10 │ id1         │         100 │
│ (&#39;id2&#39;,200)         │ id2 │         20 │         30 │ id2         │         200 │
│ (&#39;id3-missing&#39;,300) │     │          0 │          0 │ id3-missing │         300 │
└─────────────────────┴─────┴────────────┴────────────┴─────────────┴─────────────┘
*/

huangapple
  • 本文由 发表于 2023年7月23日 14:23:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76746880.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定