MySQL / MariaDB – 大量列的策略

huangapple go评论65阅读模式
英文:

MySQL / MariaDB - Strategies for Large Number of Columns

问题

我即将设计一个数据库,其中将从多个数据源接收到不同时间的数据帧。这些数据帧中的每一个都需要被解析并转换为工程单位,然后以适当的数据类型(例如,FLOAT、INT)存储到数据库中。因此,一般的结构将是一个DATETIME数据类型用作主键,其余列将是这些转换后的值。这将使所有数据保持与接收时间的正确对齐。

对于我的大多数表格(代表每个数据源/数据帧),列数相对较少,大约几百列。但是将会有一个表格,似乎有大约10,000列。根据以前的经验,我知道存在最大列数(4,096)和完整行大小(65,536字节)的硬限制。

以前,我会将其分成“相邻”的表格,比如tableA_000、tableA_001等,每个表格包含一些任意数量的列,但仍然都使用相同的主键(时间)。但是这样做会变得有点混乱,主要问题是我必须创建一个单独的查找表格,以便在查找某些内容时首先引用它(例如,“colVoltage在tableA_007中”,“colTimer在tableB_003中”)。我当然可以再次以这种方式做,但我很好奇...是否还有其他不同的处理方法?在理想的情况下,我希望能够插入和从看起来像一个大表格的地方选择数据,尽管有这么多列。

MariaDB 10.11.3 / CentOS 8

英文:

I am about to design a database for which I will have frames from multiple data sources arriving, at different times. Each of these frames will need to be dissected and converted into engineering units for storage into the database with an appropriate data type (e.g. FLOAT, INT). Accordingly, the general structure would be a DATETIME datatype for a primary key and the remaining columns would be each of these converted values. This would keep everything aligned to the proper time of receipt.

For most of my tables (representing each data source/frame) the column counts are relatively small, several hundred columns. However there will be one that looks to have approximately 10,000 columns. From previous experience, I know that there are both hard limits for the maximum number of columns (4,096) as well as the the complete row size (65,536 bytes).

Previously, I would have split this up into "adjacent" tables, like tableA_000, tableA_001, etc with each holding some arbitrary number of columns and yet still all using the same primary key (time). But doing this becomes a bit messy, namely that I have to create a separate lookup table to first reference when I want to find something. (For example, "colVoltage is in tableA_007", "colTimer is in tableB_003") I can certainly do this again in this manner, but I'm curious ... does anybody else have any different strategies for handling this? In an ideal world, I would be able to both insert to and select from what (to me) looks like one big table despite having so many columns.

MariaDB 10.11.3 / CentOS 8

答案1

得分: 1

I think this sounds like a use case for a kind of document store. Creating tables with so many columns is likely to become unmanageable quite quickly. You can do this in MySQL using the JSON data type which gives you a very flexible data structure within a single table. You can also use generated columns to create fast indexes on data elements within your data that are common.

I'd also question why you're using a DATETIME as a primary key, surely there's a potential for clashes? This would be my create statement:

CREATE TABLE frames (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
type VARCHAR(50) NOT NULL,
data JSON NOT NULL,
example_key INT GENERATED ALWAYS AS (data->"$.example_key"),
INDEX(created),
INDEX(type),
INDEX(example_key)
);

The MySQL docs are good on this, at https://dev.mysql.com/doc/refman/8.0/en/json.html and https://dev.mysql.com/doc/refman/8.0/en/create-table-secondary-indexes.html#json-column-indirect-index

Obviously there are other database engines that are designed exclusively to function in this way which might be a better choice if this is the only function of your app and doesn't need a relational DB for other bits. Also bear in mind that the JSON type can be a bit slow, because if you're SELECTing you end up pulling out the entire contents of that field for each record returned. Again generated fields can be good for this.

英文:

Personally I think this sounds like a use case for a kind of document store. Creating tables with so many columns is likely to become unmanageable quite quickly. You can do this in MySQL using the JSON data type which gives you a very flexible data structure within a single table. You can also use generated columns to create fast indexes on data elements within your data that are common.

I'd also question why you're using a DATETIME as a primary key, surely there's a potential for clashes? This would be my create statement:

CREATE TABLE frames (   
  id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  type VARCHAR(50) NOT NULL,
  data JSON NOT NULL,
  example_key INT GENERATED ALWAYS AS (data->"$.example_key"),
  INDEX(created),
  INDEX(type),
  INDEX(example_key)
); 

The MySQL docs are good on this, at https://dev.mysql.com/doc/refman/8.0/en/json.html and https://dev.mysql.com/doc/refman/8.0/en/create-table-secondary-indexes.html#json-column-indirect-index

Obviously there are other database engines that are designed exclusively to function in this way which might be a better choice if this is the only function of your app and doesn't need a relational DB for other bits. Also bear in mind that the JSON type can be a bit slow, because if you're SELECTing you end up pulling out the entire contents of that field for each record returned. Again generated fields can be good for this.

huangapple
  • 本文由 发表于 2023年6月8日 23:16:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76433360.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定