如何调整QuestDB中的符号列容量?

huangapple go评论55阅读模式
英文:

How do I tune symbol column capacity in QuestDB?

问题

CREATE TABLE my_table (symb SYMBOL CAPACITY 256 INDEX CAPACITY 256);

英文:

Say, I have a symbol column with 1M unique values and 10M rows per day and I want to add an index for that column.

How do I tune symbol and index capacity to make sure that QuestDB performance is optimal?

The default values are 256:

CREATE TABLE my_table (symb SYMBOL CAPACITY 256 INDEX CAPACITY 256);

答案1

得分: 1

Internally symbols use a symbol table, i.e. a mapping from string values to internal ids (32-bit integers).

There are two separate capacity settings:

  1. Symbol table capacity: symb SYMBOL CAPACITY N - this one should be at least as big as the expected number of unique symbol values. You could think of the symbol table as a persistent hash table: if the number of buckets is insufficient, there will be unnecessary bucket scans on lookups.
  2. Index block capacity: symb SYMBOL INDEX CAPACITY M - we recommend keeping the default value for this one which is 256. Index blocks are part of a persistent linked list that stores row ids for a given symbol value. There is no need to tweak this capacity as the linked list grows when needed.

You should set the symbol table capacity as big as the expected number of unique symbol values while keeping the default value for the index block capacity:

CREATE TABLE my_table (symb SYMBOL CAPACITY 1000000 INDEX);

There is also CACHE/NOCACHE setting which either enables or disables on-heap cache used for symbol lookups. It's enabled by default and we recommend disabling it only when your symbol column has a lot of unique values (way more than a million) or you have many symbol columns, so that JVM heap won't fit caches for all of them.

One more thing to notice: while indexes help to avoid full scans in certain queries, they slow down inserts. We recommend starting with no index and then adding them if you find the query performance insufficient.

英文:

Internally symbols use a symbol table, i.e. a mapping from string values to internal ids (32-bit integers).

There are two separate capacity settings:

  1. Symbol table capacity: symb SYMBOL CAPACITY N - this one should be at least as big as the expected number of unique symbol values. You could think of the symbol table as a persistent hash table: if the number of buckets is insufficient, there will be unnecessary bucket scans on lookups.
  2. Index block capacity: symb SYMBOL INDEX CAPACITY M - we recommend keeping the default value for this one which is 256. Index blocks are part of a persistent linked list that stores row ids for a given symbol value. There is no need to tweak this capacity as the linked list grows when needed.

You should set the symbol table capacity as big as the expected number of unique symbol values while keeping the default value for the index block capacity:

CREATE TABLE my_table (symb SYMBOL CAPACITY 1000000 INDEX);

There is also CACHE/NOCACHE setting which either enables or disables on-heap cache used for symbol lookups. It's enabled by default and we recommend disabling it only when your symbol column has a lot of unique values (way more than a million) or you have many symbol columns, so that JVM heap won't fit caches for all of them.

One more thing to notice: while indexes help to avoid full scans in certain queries, they slow down inserts. We recommend starting with no index and then adding them if you find the query performance insufficient.

huangapple
  • 本文由 发表于 2023年3月7日 16:51:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75659752.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定