英文:
How to get the records created in last 5 minutes within a 4m records table?
问题
我有一个MySQL表activity_logs
,它保存了自应用程序启动以来客户端应用程序所做的所有API调用,应用程序已经运行了几年,这个表现在有450万条记录。
值得注意的是,每条记录都有一个增量ID,而且created_at
字段在记录创建后不会被更新,所以created_at
的值基本上是像id
字段一样递增的。
id
是主键created_at
没有建立索引
今天我注意到了一个性能问题,当我需要查询最近5分钟内创建的记录时,我做了一些简单的事情,比如在我的PHP代码中写了where("created_at", ">=", now()->subMinutes(5))
,不久后发现这个查询已经搜索了全部的450万条记录,并且非常慢。然而,这是有道理的,因为数据库引擎不知道在我的情况下created_at
始终是递增的。
在盲目地向created_at
添加索引来解决问题之前,我停下来考虑了一下,created_at
的值如此多样化,所以给它添加索引似乎不太合适。最终,我采用了两个单独的查询。
SELECT id FROM activity_logs WHERE created_at < DATE_SUB(NOW(), INTERVAL 5 MINUTE) ORDER BY id DESC LIMIT 1
SELECT * FROM activity_logs WHERE id > [查询结果]
这个方法很有效,但我想知道处理这个问题的__标准__方法是什么,因为我认为这是一个非常常见的问题。我喜欢PHP/Laravel和ORM,没有过多关注SQL话题,希望能听到您的见解。
英文:
I have a MySQL table activity_logs
which saves all API calls made by client apps since the very beginning, the app has been launched a couple of years, and this table now has 4.5 million records.
It worths to notice that each record will have an incremental ID, and also the created_at
field won't be updated after the record is created, so the values of created_at
is basically incremental in the same way as id
field.
id
is the primary keycreated_at
is not indexed
I noticed a performance issue today when I need to query the records created in last 5 minutes, I did something simple like where("created_at", ">=", now()->subMinutes(5))
in my PHP code, and shortly notice this query had searched all 4.5 million records and was terribly slow. However, it makes sense because database engine doesn't know that created_at
is always incremental in my case.
I paused before blindly adding an index to created_at
to fix it, the value of created_at
is so diverse so adding an index to it doesn't sound to be right. I ended up with two separate queries.
SELECT id FROM activity_logs WHERE created_at < DATE_SUB(NOW(), INTERVAL 5 MINUTE) ORDER BY id DESC LIMIT 1
SELECT * FROM activity_logs WHERE id > [Query Result]
It works well, but I am wondering what's the standard way to deal with this issue, as I believe it is a very common issue. I am in love with PHP/Laravel and ORM and didn't follow up on SQL topics too much, and would like to see your insights.
答案1
得分: 1
添加INDEX(created_at)
,然后使用
WHERE created_at > NOW() - INTERVAL 5 MINUTE
ORDER BY created_at, id
更多细节:
- 我假设(希望)你在那张表上有
ENGINE=InnoDB
- 如果你有
PRIMARY KEY(id)
,那么它是基于id
聚集的,MySQL 无法 提供另一个聚集索引。 - 上面建议的二级索引在末尾默默地附加了
id
。(这也是它如何找到行的方式。) - 我建议的代码将会很快。
- MySQL 没有
CLUSTERED
关键字。主键 是 聚集的,是 唯一的,是 一个索引。(MySQL 是 '精简而高效' 的;它没有像其他供应商那样多的花哨功能,因为它们不值得实施和维护。) - 一个类似的技巧(但在这里并不是真正必要的)是
PRIMARY KEY(created_at, id), -- 为了获得你想要的聚集效果
INDEX(id) -- 为了让 AUTO_INCREMENT 开心
英文:
Add INDEX(created_at)
, then use
WHERE created_at > NOW() - INTERVAL 5 MINUTE
ORDER BY created_at, id
More details:
-
I assume (hope) that you have
ENGINE=InnoDB
on that table -
If you have
PRIMARY KEY(id)
, then it is clustered onid
and MySQL cannot provide another cluster index. -
The secondary index suggested above has
id
silently tacked on the end. (That is also how it finds the row.) -
The code I suggest will be fast.
-
MySQL has no
CLUSTERED
keyword. The PK is clustered and is Unique and is an index. (MySQL is 'lean an mean'; it does not have as many bells and whistles as other vendors because they aren't worth the effort to implement and maintain.) -
A similar trick (but not really necessary here) is
PRIMARY KEY(created_at, id), -- To get the clustering you desire INDEX(id) -- to keep AUTO_INCREMENT happy
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论