英文:
Does create index on a column always sort that column in background?
问题
我以前经常使用SQL,但从来没有深入关注它的工作原理。最近,我一直在研究键和索引。我对索引的工作原理有一些疑问。
据我理解,当我在某一列上创建索引时,索引会存储该列的副本,并附带对应表行的引用,并以排序顺序存储。
例如,如果我有如下表格:
Id Name
2 B
1 A
3 C
然后,如果我在**"Name"**列上创建索引,索引的"表"将类似于:
Name Pointer
A Row2
B Row1
C Row3
通过这样的排序顺序,在搜索**"Name"**时会更快,因为可以应用二分查找等算法,其时间复杂度可以为O(logn),而不是O(n)。
这是否意味着在列上创建索引总是在后台对该列进行排序,这是唯一的方式吗?因为如果不排序,那么搜索特定值时的时间复杂度仍然像扫描整个表一样,这没有改变。
英文:
I used SQL a lot but I never cared about how it works on a deeper level. Recently I've been doing some research about keys and indexes. I have some questions about how exactly the index work.
As I understand, when I create an index on a column, the index stores a copy of the indexed column along with a reference to the corresponding table row in sorted order.
For example, if I have a table like below
Id Name
2 B
1 A
3 C
Then if I create an index on the "Name" column the index "table" will be something like
Name Pointer
A Row2
B Row1
C Row3
With sorted order like this, it will be faster when searching for "Name" since you can apply an algorithm like binary search, which time-complexity can be O(logn) instead of O(n)
Does it mean that creating an index on a column always sorts that column in the background and it's the only way? Because if it's not sorted, then the time complexity when searching for a specific value is still like scanning the whole table, which makes no change.
答案1
得分: 1
你说过:“我主要使用MySQL” -- 即使是这种情况,也取决于。
- 旧版本的MySQL和MariaDB会重新构建表,包括新的索引和任何先前的索引。
- 较新版本会在后台执行
CREATE INDEX
,最终允许查询使用它。
在MySQL中,PRIMARY KEY
(以及许多其他品牌中的“聚集”索引)将数据按该键排序。一个“次要”索引本身是按键在“B+树”中排序的,但必须跨越到数据的B+树中获取其余列。
对于多对多映射表,建议构建两个索引:
CREATE TABLE foo_bars (
foo_id ...,
bar_id ...,
PRIMARY KEY(foo_id, bar_id),
INDEX(bar_id, foo_id) )
这样,您将获得两个完全排序的查找B树。
“二分查找” -- 了解一下B树;当数据太大以至于需要从磁盘获取时,它们比二分查找更好。
“对应表行的引用” -- 在InnoDB中,引用是PRIMARY KEY
的副本,因此需要进行第二次B树查找。在MyISAM中,它是对.MYD
文件的地址,一次文件查找。在其他一些供应商的系统中,它是一个“行号”。
“列存储”表是一种完全不同的数据结构。它涉及到两级查找和解压缩。
“全文检索”索引本质上是一个反向查找,将'单词'映射到'行'。
“空间”索引是一个“R-树”。普通索引是一维的;空间索引提供了二维索引。
WHERE x = 2
-- 一个“点查询” -- B树查找WHERE x BETWEEN 4 AND 44
-- 首先是B树查找,然后使用B+树中的+
执行“范围扫描”- 否则,使用“表扫描”(或“索引扫描”)-- 查看每一行。
希望这有所帮助;抱歉我没有关于Postgres等的详细信息。
英文:
You said "I mainly use Mysql" -- Even for that case, it depends.
- Older versions of MySQL and MariaDB would rebuild the table including the new index and any previous indexes.
- Newer versions do the
CREATE INDEX
in the background and eventually allow queries to use it.
The PRIMARY KEY
in MySQL (and a "clustered" index in many other brands) will order the data by that key. A "secondary" index, itself, is sorted by the key in a "B+Tree", but has to reach over into the data's B+Tree to get the rest of the columns.
For a many-to-many-mapping table it is a good idea to build two indexes:
CREATE TABLE foo_bars (
foo_id ...,
bar_id ...,
PRIMARY KEY(foo_id, bar_id),
INDEX(bar_id, foo_id) )
That way, you get two fully sorted lookup BTrees.
"binary search" -- Read about BTrees; they are better than binary search when the data can be so big that it needs to be fetched from disk.
"reference to the corresponding table row" -- In InnoDB, the reference is a copy of the PRIMARY KEY
, hence a second BTree lookup. In MyISAM it is an address into the .MYD
file, a single file seek. In some other vendors, it is a "row number".
A "Columnstore" table is a radically different animal. It involves a 2-level lookup and uncompression.
A "Fulltext" index is essentially an inverted lookup of 'word' -> 'row.
A "Spatial" index is an "R-Tree". Ordinary indexes are 1-dimensional; Spatial gives you a 2-dimensional index.
WHERE x = 2
-- A "point query" -- BTree lookupWHERE x BETWEEN 4 AND 44
-- First a Btree lookup, then use the+
in B+Tree to do a "range scan"- Otherwise, use a "table scan" (or "index scan") -- Look at every row.
Hope this helps; sorry that I don't have details for Postgres, etc.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论