在ClickHouse中使用倒排索引搜索字符串数组。

huangapple go评论69阅读模式
英文:

Clickhouse- Search array of strings using inverted indexes

问题

我有一个包含Array(String)类型列的表格

CREATE TABLE db.logs
( `timestamp` DateTime CODEC(Delta(4), ZSTD(1)),
    `message` String,    
    `source_type` LowCardinality(String),
    `labels_key` Array(String),
    `labels_value` Array(String),
     INDEX lk_inv_idx labels_key TYPE inverted GRANULARITY 1,
     INDEX lv_inv_idx labels_value TYPE inverted GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY timestamp
PRIMARY KEY (timestamp)
ORDER BY (timestamp)
SETTINGS index_granularity = 8192;

数组列:
['color','make','year','type', 'interior'] ['red','toyota','2020','crossover', 'black']
['color','make','year','type', 'interior'] ['white','dodge','2023','pickup', 'black']
['color','make','year','type', 'interior'] ['red','audi','2021','sedan', 'red']
['color','make','year','type', 'interior'] ['yellow','bmw','2020','hatchback', 'yellow']

我想要使用倒排索引搜索数组列,但当我使用has()函数进行搜索时,它不使用索引,而且我不能使用hasToken()(这是我用来搜索具有倒排数组的String列的方法)。对于如何在Array(String)列上使用倒排索引进行搜索,有任何想法吗?先感谢您。

英文:

I have table with column type Array(String)

CREATE TABLE db.logs
( `timestamp` DateTime CODEC(Delta(4),
 ZSTD(1)),
    `message` String,    
    `source_type` LowCardinality(String),
    `labels_key` Array(String),
    `labels_value` Array(String),
     INDEX lk_inv_idx labels_key TYPE inverted GRANULARITY 1,
     INDEX lv_inv_idx labels_value TYPE inverted GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY timestamp
PRIMARY KEY (timestamp)
ORDER BY
(timestamp)
SETTINGS index_granularity = 8192;

array columns:
['color','make','year','type', 'interior']  ['red','toyota','2020','crossover', 'black']
['color','make','year','type', 'interior']  ['white','dodge','2023','pickup', 'black']
['color','make','year','type', 'interior']  ['red','audi','2021','sedan', 'red']
['color','make','year','type', 'interior']  ['yellow','bmw','2020','hatchback', 'yellow']

I want to search array columns using inverted index but when I search using has() functions, it doesn't use the index and I can't use hasToken() (This is what I am using to search String columns with inverted array).
Any ideas on how I can search using inverted index on Array(String) column.
Thanks in advance.

答案1

得分: 0

根据 https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#functions-support

has 在倒排索引中受支持

但您需要了解 ClickHouse 中索引的工作原理
这不是用于搜索的索引,当您首先在索引中搜索,然后在主表空间中进行二次搜索

这是数据跳过索引,您跳过数据部分(system.parts)中的颗粒(取决于GRANULARITY),因此如果您的数据分散在所有颗粒中,那么二级索引就无用了。

查看
https://fiddle.clickhouse.com/df6acefe-5511-41e8-bd32-c3d1815d16d1

在输出中
Index lk_inv_idx has dropped 0/2 granules.
意味着,已使用二级索引,但所有颗粒都包含color,因此该索引无用。

英文:

According to https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#functions-support

has supported by inverted index

but you need to understand how indexes works in ClickHouse
this is not indexes for search, when you firstly search in index and secondary search in main table space

this is DATA SKIP indexes, you skip granules (how many depends on GRANULARITY) inside data parts (system.parts) during filtering, so if your data spread by all granules, then secondary indexes are useless.

Look
https://fiddle.clickhouse.com/df6acefe-5511-41e8-bd32-c3d1815d16d1

In output
Index lk_inv_idx has dropped 0/2 granules.
means, secondary index was used, but all granules contains color, so the index was useless.

huangapple
  • 本文由 发表于 2023年6月25日 16:58:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76549645.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定