2023年6月13日 10:20:05go评论127阅读模式

英文:

Does create index on a column always sort that column in background?

问题

我以前经常使用SQL，但从来没有深入关注它的工作原理。最近，我一直在研究键和索引。我对索引的工作原理有一些疑问。

据我理解，当我在某一列上创建索引时，索引会存储该列的副本，并附带对应表行的引用，并以排序顺序存储。

例如，如果我有如下表格：

Id Name
2  B
1  A
3  C

然后，如果我在**"Name"**列上创建索引，索引的"表"将类似于：

Name Pointer
A    Row2
B    Row1
C    Row3

通过这样的排序顺序，在搜索**"Name"**时会更快，因为可以应用二分查找等算法，其时间复杂度可以为O(logn)，而不是O(n)。

这是否意味着在列上创建索引总是在后台对该列进行排序，这是唯一的方式吗？因为如果不排序，那么搜索特定值时的时间复杂度仍然像扫描整个表一样，这没有改变。

英文:

I used SQL a lot but I never cared about how it works on a deeper level. Recently I've been doing some research about keys and indexes. I have some questions about how exactly the index work.

As I understand, when I create an index on a column, the index stores a copy of the indexed column along with a reference to the corresponding table row in sorted order.

For example, if I have a table like below

Id Name
2  B
1  A
3  C

Then if I create an index on the "Name" column the index "table" will be something like

Name Pointer
A    Row2
B    Row1
C    Row3

With sorted order like this, it will be faster when searching for "Name" since you can apply an algorithm like binary search, which time-complexity can be O(logn) instead of O(n)

Does it mean that creating an index on a column always sorts that column in the background and it's the only way? Because if it's not sorted, then the time complexity when searching for a specific value is still like scanning the whole table, which makes no change.

答案1

得分: 1

你说过：“我主要使用MySQL” -- 即使是这种情况，也取决于。

旧版本的MySQL和MariaDB会重新构建表，包括新的索引和任何先前的索引。
较新版本会在后台执行CREATE INDEX，最终允许查询使用它。

在MySQL中，PRIMARY KEY（以及许多其他品牌中的“聚集”索引）将数据按该键排序。一个“次要”索引本身是按键在“B+树”中排序的，但必须跨越到数据的B+树中获取其余列。

对于多对多映射表，建议构建两个索引：

CREATE TABLE foo_bars (
    foo_id ...,
    bar_id ...,
    PRIMARY KEY(foo_id, bar_id),
    INDEX(bar_id, foo_id) )

这样，您将获得两个完全排序的查找B树。

“二分查找” -- 了解一下B树；当数据太大以至于需要从磁盘获取时，它们比二分查找更好。

“对应表行的引用” -- 在InnoDB中，引用是PRIMARY KEY的副本，因此需要进行第二次B树查找。在MyISAM中，它是对.MYD文件的地址，一次文件查找。在其他一些供应商的系统中，它是一个“行号”。

“列存储”表是一种完全不同的数据结构。它涉及到两级查找和解压缩。

“全文检索”索引本质上是一个反向查找，将'单词'映射到'行'。

“空间”索引是一个“R-树”。普通索引是一维的；空间索引提供了二维索引。

WHERE x = 2 -- 一个“点查询” -- B树查找
WHERE x BETWEEN 4 AND 44 -- 首先是B树查找，然后使用B+树中的+执行“范围扫描”
否则，使用“表扫描”（或“索引扫描”）-- 查看每一行。

希望这有所帮助；抱歉我没有关于Postgres等的详细信息。

英文:

You said "I mainly use Mysql" -- Even for that case, it depends.

Older versions of MySQL and MariaDB would rebuild the table including the new index and any previous indexes.
Newer versions do the CREATE INDEX in the background and eventually allow queries to use it.

The PRIMARY KEY in MySQL (and a "clustered" index in many other brands) will order the data by that key. A "secondary" index, itself, is sorted by the key in a "B+Tree", but has to reach over into the data's B+Tree to get the rest of the columns.

For a many-to-many-mapping table it is a good idea to build two indexes:

CREATE TABLE foo_bars (
    foo_id ...,
    bar_id ...,
    PRIMARY KEY(foo_id, bar_id),
    INDEX(bar_id, foo_id) )

That way, you get two fully sorted lookup BTrees.

"binary search" -- Read about BTrees; they are better than binary search when the data can be so big that it needs to be fetched from disk.

"reference to the corresponding table row" -- In InnoDB, the reference is a copy of the PRIMARY KEY, hence a second BTree lookup. In MyISAM it is an address into the .MYD file, a single file seek. In some other vendors, it is a "row number".

A "Columnstore" table is a radically different animal. It involves a 2-level lookup and uncompression.

A "Fulltext" index is essentially an inverted lookup of 'word' -> 'row.

A "Spatial" index is an "R-Tree". Ordinary indexes are 1-dimensional; Spatial gives you a 2-dimensional index.

WHERE x = 2 -- A "point query" -- BTree lookup
WHERE x BETWEEN 4 AND 44 -- First a Btree lookup, then use the + in B+Tree to do a "range scan"
Otherwise, use a "table scan" (or "index scan") -- Look at every row.

Hope this helps; sorry that I don't have details for Postgres, etc.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在一个列上创建索引是否总是在后台对该列进行排序？

问题

答案1

如何基于 SQL Server 中另一列的多个字符串更新列？

生成带有序列下一个值的INSERT脚本。

如何将一个函数应用于一列的值，然后将输出应用于pandas中的多个其他列。

How to get average value per object in a list with Dart

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。