2023年3月21日 03:15:48go评论130阅读模式

英文:

Postgres - Best way to store hex values

问题

I have an application which deals with blockchain data, so the postgres tables have quite a lot of fields which are constant sized hex values.

One of the most important columns is the public_address column of the user table, which is a 20 bytes hex value and the primary key of the table. The reason I say important because this column is used in almost all other queries within a JOIN or WHERE clause, so it's indexing and query-speed is crucial.

On the other hand there is another column nonce in transaction table, which is a 32 bytes hex value with a single constraint NOT NULL on it. This is not used in any JOIN or WHERE clauses, and only used in SELECT statements to get the value of the column.

I am aware that bytea is storage-wise more efficient than varchar, but I don't know which one is better to use for indexes and for faster queries.

So my question is, under which circumstances bytea should be used over varchar(N) for hex values, and vice-verse, taking into account the indexing and query-speed?

英文:

I have an application which deals with blockchain data, so the postgres tables have quite a lot of fields which are constant sized hex values.

I am aware that bytea is storage-wise more efficient than varchar, but I don't know which one is better to use for indexes and for faster queries.

So my question is, under which circumstances bytea should be used over varchar(N) for hex values, and vice-verse, taking into account the indexing and query-speed?

答案1

得分: 2

你所谓的“十六进制值”只是二进制数据的文本表示。这才是重要的。它是纯粹的、原始的、任意的二进制数据。只是一系列字节。

索引和查询速度（虽然我不指望它们有太大不同）并不重要，适合任意字节序列的唯一合适数据类型是bytea。

不要使用varchar：“可以存储在这些数据类型中的字符由数据库字符集确定，该字符集在创建数据库时选择。无论特定的字符集如何，带有代码零（有时称为NUL）的字符都无法存储。”

英文:

What you call a "hex value" is just a text representation of binary data. And that is what matters. It's plain, raw, arbitrary, binary data. Just a sequence of bytes.

Indexing and query speed (while I don't expect them to be much different) do not matter, there is only one appropriate data type for arbitrary byte sequences: bytea.

Do not use varchar: "The characters that can be stored in any of these data types are determined by the database character set, which is selected when the database is created. Regardless of the specific character set, the character with code zero (sometimes called NUL) cannot be stored."

答案2

得分: 2

简而言之，使用 bytea。它可能性能更好，但更重要的是它是正确的；varchar 存储不合法的字节。从文档中可以看出...

首先，二进制字符串专门允许存储零值字节和其他“不可打印”的字节（通常在十进制范围32到126之外的字节）。字符字符串不允许零字节，也不允许根据数据库选定的字符集编码不合法的任何其他字节值和字节值序列。其次，对二进制字符串的操作处理实际字节，而对字符字符串的处理依赖于语言环境设置。简而言之，二进制字符串适用于存储程序员视为“原始字节”的数据，而字符字符串适用于存储文本。

varchar 存储表示字符的字节，使用给定的字符编码。在许多字符编码中，某些字节组合是不合法的；它们不代表任何字符。例如，在UTF-8中，一个字节值大于7F是不合法的。因此，在 varchar 中无法可靠地存储字节作为字节。

相反，varchar 将存储字节的字符串表示。因此，\x00AA 不会存储为两个字节00 AA，而会存储为字符串""\x00AA""，在UTF-8中为5C 78 30 30 41 41。

演示。

在十六进制字符和实际字节之间进行转换需要更多的工作，也增加了出错的机会。使用 bytea 更高效，也更正确。

英文:

tl;dr. Use bytea. It will likely be more performant, but more importantly it's correct; there are some bytes which are illegal for varchar. From the docs...

> First, binary strings specifically allow storing octets of value zero and other “non-printable” octets (usually, octets outside the decimal range 32 to 126). Character strings disallow zero octets, and also disallow any other octet values and sequences of octet values that are invalid according to the database's selected character set encoding. Second, operations on binary strings process the actual bytes, whereas the processing of character strings depends on locale settings. In short, binary strings are appropriate for storing data that the programmer thinks of as “raw bytes”, whereas character strings are appropriate for storing text.

varchar stores bytes which represent characters using a given character encoding. In many character encodings, certain byte combinations are illegal; they represent no character. For example, in UTF-8 a single byte above 7F is illegal. So you can't reliably store bytes as bytes in varchar.

Instead, varchar will store a string representation of the bytes. So \x00AA will be stored not as the two bytes 00 AA but as the string "\x00AA" which, in UTF-8, is 5C 78 30 30 41 41.

Demonstration.

Converting back and forth between the hex characters and the real bytes is more work and more opportunities for mistakes. It's more efficient and more correct to use bytea.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Postgres – 存储十六进制数的最佳方式

问题

答案1

答案2

在尝试使用Thymeleaf（Java，Spring Boot）显示编码图像时出现异常。

docker-compose Spring Boot 无法连接到 Postgres

执行查询不会改变Heroku的postgreSQL数据库。

在PostgreSQL中限制可视列宽度显示

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论