英文:
MySQL- INDEX(): How to Create a Functional Key Part Using Last nth Characters?
问题
如何编写INDEX()语句以使用函数键部分的最后N个字符?我对SQL/MySQL全新,相信这是我问题的正确措辞。下面是我寻找的解释。
MySQL 8.0参考手册解释了如何使用前n个字符,示例中显示了使用col2的前10个字符的辅助索引(通过示例):
CREATE TABLE t1 (
col1 VARCHAR(40),
col2 VARCHAR(30),
INDEX (col1, col2(10))
);
然而,我想知道如何使用最后的字符形成这个。也许像这样:
...
INDEX ((RIGHT (col2,3)));
);
但我认为这是在一个名为'xyz'的列上建立索引,而不是“在每个列值的最后3个字符中放置索引,该列值有30个潜在字符”?这是我真正想弄清楚的。
为了提供一些背景信息,对于索引一些混合数据的操作可能会有所帮助,我正在研究如何完成这样的操作。下面是我所说的数据类型示例,这是从一款来自90年代的库存/账单管理器导出的数据的简化、调整版本,我曾经在几年前使用过...:
Col1 | Col2 |
---|---|
GP6500012_SALES_FY2023_SBucks_503_Thurs | R-DK_Sumat__SKU-503-20230174 |
GP6500012_SALES_FY2023_SBucks_607_Mon | R-MD_Columb__SKU-607-2023035 |
GP6500012_SALES_FY2023_SBucks_627_Mon-pm | R-BLD_House__SKU-503-20230024 |
GP6500012_SALES_FY2023_SBucks_929_Wed | R-FR_Ethp__SKU-929-20230324 |
毫无疑问,存在绕过这个问题的更好的选择,我将来可能会在我的数据分析课程中学到这些技巧。但现在,我只是好奇是否有可能以某种方式根据后缀而不是前缀对行进行索引,并且要完成这个操作需要怎样的代码示例。谢谢您。
英文:
How would I write the INDEX() statement to use the last Nth characters of a functional keypart? I'm brand new to SQL/MySQL, and believe that's the proper verbiage of my question. explanation of what I'm looking for is below.
The MySQL 8.0 Ref Manual explains how to use the first nth characters, showing that the secondary index using col2's first 10 characters, via example:
CREATE TABLE t1 (
col1 VARCHAR(40),
col2 VARCHAR(30),
INDEX (col1, col2(10))
);
However, I would like to know how one could form this using the ending characters? Perhaps something like:
...
INDEX ((RIGHT (col2,3)));
);
However, I think that says to index over a column called 'xyz' instead of "put an index on each column value using the last 3 of 30 potential characters"? That's what I'm really trying to figure out.
For some context, it'd be helpful to index something with smooshed/mixed data and am playing around as to how such a thing could be accomplished. Example of the kind of data I'm talking about, below, is a simplified, adjusted version of exported data from an inventory/billing manager that hails from the 90's that I had to endure some years back...:
Col1 | Col2 |
---|---|
GP6500012_SALES_FY2023_SBucks_503_Thurs | R-DK_Sumat__SKU-503-20230174 |
GP6500012_SALES_FY2023_SBucks_607_Mon | R-MD_Columb__SKU-607-2023035 |
GP6500012_SALES_FY2023_SBucks_627_Mon-pm | R-BLD_House__SKU-503-20230024 |
GP6500012_SALES_FY2023_SBucks_929_Wed | R-FR_Ethp__SKU-929-20230324 |
Undoubtedly, better options exist that bypass this question altogether- and I'll presumably learn those techniques with time in my data analytics coursework. For now, I'm just curious if it's possible to somehow index the rows by suffix instead of prefix, and what a code example would look like to accomplish that. TIA.
答案1
得分: 1
Proposed solution (INDEX ((RIGHT (col2,3)))
):
不可用。
Case 1:
当您需要拆分一个列以进行搜索时,您可能设计了错误的模式。特别是列的那一部分应该放在它自己的列中。尽管如此,也可以使用一个“虚拟”(或“生成的”)列,该列是原始列的函数,然后对其进行INDEX
。
Case 2:
如果您建议最后的3个字符是最具选择性的,并且可能加快任何查找速度,那就不必麻烦了。只需对整个列进行索引。
That data:
我建议将由“_”连接在一起的内容拆分开。在插入行时执行此操作。如果需要重新组合它们,请在随后的SELECT
中执行。
DATEs:
另一方面,不要拆分日期(年、月等)。保持它们在一起。 (这是另一个讨论。)始终努力将日期(和日期时间)转换为MySQL格式(年份优先)进行存储。这样,您可以正确使用索引和使用许多日期函数。
MySQL的前缀索引:
一般来说,使用INDEX(col(10))
构造是一个“不好的主意”。它很少有益处;它经常不像您期望的那样使用索引。这尤其具有迷惑性:UNIQUE(col(10))
-- 它声明前10个字符是唯一的,而不是整个col
!
CAST:
如果数据的数据类型不正确(字符串与整数不匹配;排序规则不正确等),那么我认为这是一个不好的模式设计。这在EAV(Entity-Attribute-Value)模式中经常发生问题。当数字被存储为字符串时,需要使用CAST
来排序(ORDER BY
)它。
功能索引:
您提出的解决方案不是“前缀”,它更复杂一些。我怀疑任何表达式,甚至在非字符串列上都可以工作。这是它可用的时间:
---- 2018-10-22 8.0.13 General Availability -- -- -----
MySQL现在支持创建功能索引关键部分,用于索引表达式值而不是列值。功能键部分使无法以其他方式进行索引的值进行索引,如JSON值。有关详细信息,请参阅CREATE INDEX Syntax。
英文:
Proposed solution (INDEX ((RIGHT (col2,3)))
):
Not available.
Case 1:
When you need to split apart a column to search it, you have probably designed the schema wrong. In particular, that part of the columns needs to be in its own column. That being said, it is possible to use a 'virtual' (or 'generated') column that is a function of the original column, then INDEX
that.
Case 2:
If you are suggesting that the last 3 characters are the most selective and that might speed up any lookup, don't bother. Simply index the entire column.
That data:
I would consider splitting up the stuff that is concatenated together by _
. Do it as you INSERT
the rows. If it needs to be put back together, do so during subsequent SELECTs
.
DATEs:
Do not, on the other hand, split up dates (into year, month, etc). Keep them together. (That's another discussion.) Always go to the effort to convert dates (and datetimes) to the MySQL format (year-first) when storing. That way, you can properly use indexes and use the many date functions.
MySQL's Prefix indexing:
In general it is a "bad idea" to use the INDEX(col(10))
construct. It rarely is of any benefit; it often fails to use the index as much as you would expect. This is especially deceptive: UNIQUE(col(10))
-- It declares that the first 10 chars are unique, not the entire col
!
CAST:
If the data is the wrong datatype (string vs int; wrong collation; etc), the I argue that it is a bad schema design. This is a common problem with EAV (Entity-Attribute-Value) schemas. When a number is stored as a string, CAST
is needed to sort (ORDER BY
) it.
Functional indexes:
Your proposed solution not a "prefix", it is something more complicated. I suspect any expression, even on non-string columns will work. This is when it became available:
> ---- 2018-10-22 8.0.13 General Availability -- -- -----
>
> MySQL now supports creation of functional index key parts that index
> expression values rather than column values. Functional key parts
> enable indexing of values that cannot be indexed otherwise, such as
> JSON values. For details, see CREATE INDEX Syntax.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论