英文:
Should we create index of a column if write operation becomes slower because of creating index?
问题
我知道,我们在列上创建索引可以加快读取查询的速度。
我了解到,一旦在列上创建索引,写入查询会变慢,因为插入操作会更新主表和索引。
我的表名是quote
,列包括entity_id, increment_id, grand_total
以及其他一些列。
我的目标列是increment_id
,它包含唯一的值。一旦数据插入,我还有一个读取查询,类似于select * from quote where increment_id='123'
。
我的问题是,我应该为increment_id
列创建索引吗?目前quote表包含10万多行,并将来会继续增长。
英文:
I know, we create indexes on column to make read queries faster.
I got to know that, once we create index on a column, write query becomes slower because when insert happens it updates main table as well as index is updated.
I have a table quote
with columns like entity_id, increment_id, grand_total
and few more columns.
My target column is increment_id
which holds unique values. Once the data is inserted, I also have a read query like select * from quote where increment_id='123'
My question is should I create index for increment_id
column? quote table contains 100K+ rows at the moment and will grow going forward.
答案1
得分: 2
如果您不创建索引,那么您展示的查询将对每个查询执行表扫描。这将导致select
查询性能非常差。
这取决于您是否要优化插入还是选择。如果您绝对需要插入尽快完成,而不关心选择查询需要多少秒,那么您可以跳过索引。
对于大多数应用程序来说,创建有助于特定读取查询的索引是一个很好的权衡。进行表扫描的开销非常昂贵,随着行数的增加,性能会变得更差。在算法复杂性的考虑下,表扫描是O(n)
,而在具有索引的表中插入只涉及两次写入B树数据结构(表是一个聚集索引,计为一次B树写入),因此您有2xO(log n)
。
随着表格增长,表扫描的性能变差得更快,而索引写入的性能变差得更慢。
英文:
If you don't create the index, then the query you show will do a table-scan on every query. That will make the select
query have very bad performance.
It's up to you to decide if you want to optimize for the inserts, or for the select. If you absolutely need inserts to be as quick as possible, and you don't care if the select takes many seconds, then you can skip the index.
For most applications, it's a good tradeoff to create the index that helps your specific read query. It's very expensive to do table-scans, and it gets worse as the number of rows increases. The overhead of inserting into a table with an index isn't so bad.
Thinking of the algorithmic complexity, a table-scan is O(n)
, whereas inserting into a table with an index is just two writes to B-tree data structures (the table is a clustered index, which counts as one B-tree write), so you have 2x O(log n)
.
The table-scan performance worse much faster as the table grows than the index writes.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论