英文:
How do I tune symbol column capacity in QuestDB?
问题
CREATE TABLE my_table (symb SYMBOL CAPACITY 256 INDEX CAPACITY 256);
英文:
Say, I have a symbol column with 1M unique values and 10M rows per day and I want to add an index for that column.
How do I tune symbol and index capacity to make sure that QuestDB performance is optimal?
The default values are 256:
CREATE TABLE my_table (symb SYMBOL CAPACITY 256 INDEX CAPACITY 256);
答案1
得分: 1
Internally symbols use a symbol table, i.e. a mapping from string values to internal ids (32-bit integers).
There are two separate capacity settings:
- Symbol table capacity:
symb SYMBOL CAPACITY N
- this one should be at least as big as the expected number of unique symbol values. You could think of the symbol table as a persistent hash table: if the number of buckets is insufficient, there will be unnecessary bucket scans on lookups. - Index block capacity:
symb SYMBOL INDEX CAPACITY M
- we recommend keeping the default value for this one which is 256. Index blocks are part of a persistent linked list that stores row ids for a given symbol value. There is no need to tweak this capacity as the linked list grows when needed.
You should set the symbol table capacity as big as the expected number of unique symbol values while keeping the default value for the index block capacity:
CREATE TABLE my_table (symb SYMBOL CAPACITY 1000000 INDEX);
There is also CACHE
/NOCACHE
setting which either enables or disables on-heap cache used for symbol lookups. It's enabled by default and we recommend disabling it only when your symbol column has a lot of unique values (way more than a million) or you have many symbol columns, so that JVM heap won't fit caches for all of them.
One more thing to notice: while indexes help to avoid full scans in certain queries, they slow down inserts. We recommend starting with no index and then adding them if you find the query performance insufficient.
英文:
Internally symbols use a symbol table, i.e. a mapping from string values to internal ids (32-bit integers).
There are two separate capacity settings:
- Symbol table capacity:
symb SYMBOL CAPACITY N
- this one should be at least as big as the expected number of unique symbol values. You could think of the symbol table as a persistent hash table: if the number of buckets is insufficient, there will be unnecessary bucket scans on lookups. - Index block capacity:
symb SYMBOL INDEX CAPACITY M
- we recommend keeping the default value for this one which is 256. Index blocks are part of a persistent linked list that stores row ids for a given symbol value. There is no need to tweak this capacity as the linked list grows when needed.
You should set the symbol table capacity as big as the expected number of unique symbol values while keeping the default value for the index block capacity:
CREATE TABLE my_table (symb SYMBOL CAPACITY 1000000 INDEX);
There is also CACHE
/NOCACHE
setting which either enables or disables on-heap cache used for symbol lookups. It's enabled by default and we recommend disabling it only when your symbol column has a lot of unique values (way more than a million) or you have many symbol columns, so that JVM heap won't fit caches for all of them.
One more thing to notice: while indexes help to avoid full scans in certain queries, they slow down inserts. We recommend starting with no index and then adding them if you find the query performance insufficient.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论