英文:
Binary data type collation with Characters in MariaDB
问题
我试图深入了解关系型数据库管理系统(RDBMS),我正在学习MariaDB。
我在理解二进制数据类型及其排序方式方面遇到了困难。
我了解到二进制(n)数据类型表示n个字节。这意味着可以分配的最小内存是1字节吗?
以下摘自MariaDB文档:
BINARY类型类似于CHAR类型,但存储的是二进制字节字符串而不是非二进制字符字符串。M代表列长度(以字节为单位)。它不包含字符集,比较和排序是基于字节的数值值。
我有以下子问题:
- 什么是字节的数值值?它是否与ASCII字符获得一个数字相同?如果不是,如何确定字节的数值值?
- 如果我在二进制数据类型中插入一个字符 'A',它将如何编码和存储?在Table Plus客户端中,我看不到它以二进制形式存储。
- 在相同的数据类型中,我可以添加字符,也可以添加一个二进制图像,例如,如何确定字节字符串的数值值?
我已经创建了一些表来测试这个问题,
首先我创建了一个用于测试Binary(1)的表,这不允许我添加超过1位的数字 0-9 或一个字符 A-Z 或 a-z
然后我创建了一个带有Binary(10)的表来看排序顺序是如何工作的
CREATE TABLE BinaryEg1(
b BINARY(1)
);
-- 添加两个二进制数字
INSERT INTO BinaryEg1 VALUES (0), (1);
-- 报错,太长了
INSERT INTO BinaryEg1 VALUES (10);
-- 添加基本字符以进行比较排序
INSERT INTO BinaryEg1 VALUES ('A'), ('a'), ('B'), ('b');
在这里,如果我查看默认排序顺序,它将数字放在大写字母之前,然后是小写字母:
SELECT * FROM BinaryEg1 ORDER BY b;
-- 创建一个包含10个二进制字节的表,以测试排序方式
CREATE TABLE BinaryEg10(
b BINARY(10)
);
-- 将值插入到10字节的二进制列中
INSERT INTO BinaryEg10 VALUES (HEX('A'));
-- 添加一些不同长度的数字以测试排序顺序
INSERT INTO BinaryEg10 VALUES
(110),
(44),
(999999),
(1111111111),
(7876);
SELECT * FROM BinaryEg10 ORDER BY b;
现在,我看到了排序顺序:
请问有人能解释一下排序是如何工作的吗?似乎第一个字节在序列中最重要,它忽略了其余的字节,
我理解,这可能是一个无关紧要且毫无意义的问题,因为通常我假设我不应该使用二进制来存储字符,但我仍然希望对这种数据类型的工作方式有一个扎实的理解。
英文:
I am trying to get a deep understanding about RDBMS, I am learning MariaDB.
Struggling to get the Binary data type and how its collation works.
I understand the binary(n) data type takes n bytes. This means the smallest memory that could be allocated is 1 byte?
This is taken from the MariaDB documentation:
> The BINARY type is similar to the CHAR type, but stores binary byte strings rather than non-binary character strings. M represents the column length in bytes.
It contains no character set, and comparison and sorting are based on the numeric value of the bytes.
I have the following sub questions:
- What exactly is a numeric value of a byte ? Is it the same as an ASCII character getting a number? If not, how is a numeric value determined ?
- If I insert a character 'A' inside a binary data type, how is this encoded and stored ? In the Table plus client, I can't see this being stored in binary.
- In the same data type i could add chars and also a binary image for example, how is the numeric value of the byte string determined?
I have created a few tables to test this,
First I create a table to test this with Binary(1) this doesn't allow me to add more than 1 digit 0-9 or 1 character A-Z or a-z
And then I have created a table with Binary(10) to see how the sorting order works
CREATE TABLE BinaryEg1(
b BINARY(1)
);
-- Adding two binary digits
INSERT INTO BinaryEg1 VALUES (0), (1);
-- Throws error, too long
INSERT INTO BinaryEg1 VALUES (10);
-- Adding basic chars to compare sort
INSERT INTO BinaryEg1 VALUES ('A'), ('a'), ('B'), ('b');
Here if I see the default sort order it places numbers before capital letters and then small case letters:
SELECT * FROM BinaryEg1 ORDER BY b;
-- Creating a table of 10 binary bytes to test how sorting works
CREATE TABLE BinaryEg10(
b BINARY(10)
);
-- Inserting values into 10 byte binary column
INSERT INTO BinaryEg10 VALUES (HEX('A'));
-- Adding some numbers of varying lengths to test sort order
INSERT INTO BinaryEg10 VALUES
(110),
(44),
(999999),
(1111111111),
(7876);
SELECT * FROM BinaryEg10 ORDER BY b;
Now, I see the sort order:
Please can someone explain how the sorting works, it seems like the first byte is the most important in the sequence, it ignores the rest of the bytes,
I understand, this might be an irrelevant and pointless question as normally i assume i should not use binary for storing chars, but i would still like to get a solid understanding about how this data type works
答案1
得分: 0
以下是您要翻译的内容:
感谢上面问题中的评论,它给了我获取十六进制值的想法。现在,我对这是如何工作有了深刻的理解。
回答:
> 一个字节的数字值究竟是什么?它与ASCII字符获取一个数字是一样的吗?如果不是,数字值是如何确定的?
字节序列中的每个字节,每个字节都是从0到255的十进制数,因为每个字节都有8位。
当我添加ASCII字符时,它的数字值映射到其十六进制数值代码,然后映射到其十进制数值代码。
所以在ASCII字符的情况下,数字代码和数字值是相同的。
> 如果我在二进制数据类型中插入字符'A',它是如何编码和存储的?在Table plus客户端中,我看不到它存储为二进制。
如果我在二进制列中插入字符'A',它将占用1字节的内存,就像我将字符'A'添加到char列中一样。
它将根据其来自MariaDB的Latin1字符集的十进制/十六进制值进行二进制编码。这映射到ASCII和Unicode表,因此是相同的。
字符'A'的十六进制值为:41。将其转换为二进制后将存储为:0100 0001,这占用1字节的内存。
十进制值为65。如果将此字符与另一行中的另一个值进行比较,比较将基于此数字值而不是与字符一样的排序。
> 在相同的数据类型中,我可以添加字符和二进制图像,例如,如何确定字节字符串的数字值?
重要的是要理解和概念化二进制数据类型作为字节序列。每个字节的数字值范围是0到255。
比较是从左到右比较每个字节的数字值。如果字节相等,然后比较下一组字节的字节值,依此类推。
以下是一些示例,说明了这一点:
-- 创建一个只有1字节的表
CREATE TABLE BinaryEg1(
b BINARY
);
-- 插入值:
INSERT INTO BinaryEg1 VALUES (0), (1), ('A'), ('B'), ('a'), ('b'), (0);
现在,我创建了两列,一列是普通值,另一列是十六进制值,我们可以看到数值相加等于ASCII、Latin1、UTF8字符集中找到的数值代码:
SELECT b, HEX(b) FROM BinaryEg1 ORDER BY b;
我正在创建另一个示例,其中包含三列,以展示相同的观点,我们可以看到列以相同的方式排序:
CREATE TABLE BinaryEg3(
b BINARY(3)
);
INSERT INTO BinaryEg3 VALUES
(418), (518), (318);
(320), (319), (317);
现在,您可以看到正常的十进制数字字符以及它们的十六进制值,以及我可以了解它们如何排序:
SELECT b, HEX(b) FROM BinaryEg3 ORDER BY b;
现在,这些值是根据这些十六进制数存储的。每个字节都有2个十六进制数字,因此3个数字有6个十六进制数字。
每个字节都是基于该数字值进行比较的,这就是为什么317排在318之前,318排在319之前。
因为十六进制/二进制值的3和1是相同的,但7 < 8 < 9
英文:
Thanks to the comment above in my question it gave me the idea of getting the hex value. Now, I have a deep understanding about how this works.
The answers:
> What exactly is a numeric value of a byte ? Is it the same as an ASCII character getting a number? If not, how is a numeric value determined ?
Each byte in the byte sequence, each byte is a number from 0-255 base 10, because each byte has 8 bits.
When I add an ASCII character, it's numeric value maps to its hex numeric code, which maps to its decimal numeric code.
So yes in the case of ASCII chars, the numeric code & numeric values are identical.
> If I insert a character 'A' inside a binary data type, how is this encoded and stored ? In the Table plus client, I can't see this being stored in binary.
If I insert the character 'A' in a binary column, it will take 1 byte of memory, the same way it would if I added the char 'A' in a char column.
It will be encoded into binary based on its Decimal/hex value from the Latin1 charset which is the default charset for MariaDB. This maps to ASCII and Unicode tables, so its the same.
The char 'A' has a hex value of: 41. This when converted to binary will be stored as: 0100 0001 and that takes a byte of memory.
The decimal value is 65. If comparing this char to another value in another row, the comparison will be based on this numeric value and not a collation as it would be with chars.
> In the same data type i could add chars and also a binary image for example, how is the numeric value of the byte string determined?
It's important to understand and conceptualise the binary data type as a SEQUENCE OF BYTES.
And each byte has a numeric value from 0 to 255.
The comparison compares the numeric value of each byte working from left to right. If the bytes are equal, then it compares the byte value from the next set of bytes and so on.
Here are some examples that show this:
-- Creating a table with just 1 byte
CREATE TABLE BinaryEg1(
b BINARY
);
-- Inserting values:
INSERT INTO BinaryEg1 VALUES (0), (1), ('A'), ('B'), ('a'), ('b'), (0);
Now, I have created two columns, one with the normal value and the other with the hex value and we can see the numeric values add up to the numeric codes as found in ASCII, Latin1, UTF8 character sets:
SELECT b, HEX(b) FROM BinaryEg1 ORDER BY b;
I am creating another example with three columns to show the same point and we can see how columns are sorted in the same way:
CREATE TABLE BinaryEg3(
b BINARY(3)
);
INSERT INTO BinaryEg3 VALUES
(418), (518), (318);
(320), (319), (317);
Now, you can see the normal decimal number characters as well as their hex values and I can get an idea of how it would be sorted:
SELECT b, HEX(b) FROM BinaryEg3 ORDER BY b;
Now, these values are stored in binary based on those hex numbers. Each byte has 2 hex digits, hence the 3 digits have 6 hex digits.
Each byte is compared based on that numeric value, which is why 317 comes before 318 and 318 before 319.
Because the hex/binary value of 3 and 1 is the same, but 7 < 8 < 9
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论