在Kdb+中,符号类型是压缩还是索引化的?

huangapple go评论99阅读模式
英文:

Is symbol type compressed or indexed in Kdb+?

问题

official doc 已经显示 symbol 是一个原子,但这个特性是否被用于压缩或索引?

我们可以了解到,在 QuestDB 中,symbol 类型被压缩和索引化,"通过消除需要显式额外表格或连接,减少了数据库模式的复杂性",但在 Kdb+ 中呢?

英文:

The official doc already showed that symbol is an atomic, but does this feature utilized for compressing or indexing ?

What we can know is that, in QuestDB, the symbol type is compressed and indexed, "Reduced complexity of database schemas by removing the need for explicit additional tables or joins", but what about in Kdb+ ?

答案1

得分: 2

符号,它们是到不同令牌(字符/字符串)的整数映射,在 q 中具有与 QuestDB 列出的许多相同属性,包括索引和压缩。

这是一个索引示例:

q)words:`the`cat`in`the`hat
q)words 1
`cat

对于压缩,例如查看 分段属性 和这个白皮书 "与 sym 文件一起工作"

英文:

Symbols, which are int mappings to distinct tokens (characters/strings), in q share many of the same properties you've listed from QuestDB, including indexing and compression.

Here is an indexing example:

q)words:`the`cat`in`the`hat
q)words 1
`cat

For compression, as an example check out parted attributes and this whitepaper "Working with sym files"

答案2

得分: 2

在kdb+内存中,每个符号(唯一的字符串)只有一份拷贝。
一旦声明一个符号,它就会被内部化(哈希化)到符号存储结构中。

当你使用符号时,kdb+使用字符串的存储地址而不是数据本身。这减少了内存使用并加速了许多操作。

在磁盘上,sym文件以类似的方式使用。表格列中使用这个文件的索引以提高速度和空间效率。

英文:

https://code.kx.com/q/basics/syscmds/#w-workspace

In kdb+ memory there is only one copy of each symbol (a unique char string).
Once you declare a symbol it is internalized (hashed) into the symbol storage structure.

When you work with symbols, kdb+ is using the storage address of the string not the data itself. This reduces memory usage and speeds up many operations.

https://code.kx.com/q/wp/symfiles/

On disk a sym file is used in a similar way. The index in to this file is used in tables column for speed and space efficiency.

答案3

得分: 1

在Kdb+中,通常对符号进行压缩而不是索引。

Kdb+中的符号是一种用于表示枚举值或分类变量的数据类型。它们存储为一组唯一的字符串,每个唯一的字符串被分配一个唯一的整数索引。当在Kdb+表中创建符号列时,唯一的字符串存储在符号表中,而列本身存储相应的整数索引。

默认情况下,Kdb+使用一种称为符号压缩的技术来最小化符号列的内存占用。通过将唯一的字符串存储在类似于字典的结构中,其中每个字符串被分配一个唯一的整数代码来实现压缩。然后,符号列存储整数代码而不是实际的字符串,从而减少了内存消耗。

在Kdb+中处理压缩符号时,系统会透明地处理符号的压缩和解压缩,从而实现高效的符号数据存储和检索,同时保留原始字符串表示。

值得注意的是,虽然在Kdb+中符号压缩是典型的方法,但如果需要,可以禁用压缩并将符号存储为简单的索引整数。但是,这种做法较不常见,可能不是大多数Kdb+安装的默认行为。

英文:

In Kdb+, symbols are typically compressed rather than indexed.

Symbols in Kdb+ are a data type used to represent enumerated values or categorical variables. They are stored as a list of unique strings, with each unique string assigned a unique integer index. When a symbol column is created in a Kdb+ table, the unique strings are stored in a symbol table, and the column itself stores the corresponding integer indices.

By default, Kdb+ uses a technique called symbol compression to minimize the memory footprint of symbol columns. The compression is achieved by storing the unique strings in a dictionary-like structure, where each string is assigned a unique integer code. The symbol column then stores the integer codes instead of the actual strings, reducing memory consumption.

When working with compressed symbols in Kdb+, the compression and decompression of symbols are transparently handled by the system, allowing efficient storage and retrieval of symbol data while maintaining the original string representations.

It's worth noting that while symbol compression is the typical approach in Kdb+, it is possible to disable compression and store symbols as simple indexed integers if desired. However, this is less common and may not be the default behavior in most Kdb+ installations.

huangapple
  • 本文由 发表于 2023年6月19日 09:01:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76503087.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定