英文:
Clickhouse- Search array of strings using inverted indexes
问题
我有一个包含Array(String)类型列的表格
CREATE TABLE db.logs
( `timestamp` DateTime CODEC(Delta(4), ZSTD(1)),
`message` String,
`source_type` LowCardinality(String),
`labels_key` Array(String),
`labels_value` Array(String),
INDEX lk_inv_idx labels_key TYPE inverted GRANULARITY 1,
INDEX lv_inv_idx labels_value TYPE inverted GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY timestamp
PRIMARY KEY (timestamp)
ORDER BY (timestamp)
SETTINGS index_granularity = 8192;
数组列:
['color','make','year','type', 'interior'] ['red','toyota','2020','crossover', 'black']
['color','make','year','type', 'interior'] ['white','dodge','2023','pickup', 'black']
['color','make','year','type', 'interior'] ['red','audi','2021','sedan', 'red']
['color','make','year','type', 'interior'] ['yellow','bmw','2020','hatchback', 'yellow']
我想要使用倒排索引搜索数组列,但当我使用has()函数进行搜索时,它不使用索引,而且我不能使用hasToken()(这是我用来搜索具有倒排数组的String列的方法)。对于如何在Array(String)列上使用倒排索引进行搜索,有任何想法吗?先感谢您。
英文:
I have table with column type Array(String)
CREATE TABLE db.logs
( `timestamp` DateTime CODEC(Delta(4),
ZSTD(1)),
`message` String,
`source_type` LowCardinality(String),
`labels_key` Array(String),
`labels_value` Array(String),
INDEX lk_inv_idx labels_key TYPE inverted GRANULARITY 1,
INDEX lv_inv_idx labels_value TYPE inverted GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY timestamp
PRIMARY KEY (timestamp)
ORDER BY
(timestamp)
SETTINGS index_granularity = 8192;
array columns:
['color','make','year','type', 'interior'] ['red','toyota','2020','crossover', 'black']
['color','make','year','type', 'interior'] ['white','dodge','2023','pickup', 'black']
['color','make','year','type', 'interior'] ['red','audi','2021','sedan', 'red']
['color','make','year','type', 'interior'] ['yellow','bmw','2020','hatchback', 'yellow']
I want to search array columns using inverted index but when I search using has() functions, it doesn't use the index and I can't use hasToken() (This is what I am using to search String columns with inverted array).
Any ideas on how I can search using inverted index on Array(String) column.
Thanks in advance.
答案1
得分: 0
根据 https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#functions-support
has
在倒排索引中受支持
但您需要了解 ClickHouse 中索引的工作原理
这不是用于搜索的索引,当您首先在索引中搜索,然后在主表空间中进行二次搜索
这是数据跳过索引,您跳过数据部分(system.parts)中的颗粒(取决于GRANULARITY),因此如果您的数据分散在所有颗粒中,那么二级索引就无用了。
查看
https://fiddle.clickhouse.com/df6acefe-5511-41e8-bd32-c3d1815d16d1
在输出中
Index
lk_inv_idx has dropped 0/2 granules.
意味着,已使用二级索引,但所有颗粒都包含color
,因此该索引无用。
英文:
According to https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#functions-support
has
supported by inverted index
but you need to understand how indexes works in ClickHouse
this is not indexes for search, when you firstly search in index and secondary search in main table space
this is DATA SKIP indexes, you skip granules (how many depends on GRANULARITY) inside data parts (system.parts) during filtering, so if your data spread by all granules, then secondary indexes are useless.
Look
https://fiddle.clickhouse.com/df6acefe-5511-41e8-bd32-c3d1815d16d1
In output
Index
lk_inv_idx has dropped 0/2 granules.
means, secondary index was used, but all granules contains color
, so the index was useless.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论