2023年7月27日 23:07:52go评论60阅读模式

英文:

How to get the records created in last 5 minutes within a 4m records table?

问题

我有一个MySQL表activity_logs，它保存了自应用程序启动以来客户端应用程序所做的所有API调用，应用程序已经运行了几年，这个表现在有450万条记录。

值得注意的是，每条记录都有一个增量ID，而且created_at字段在记录创建后不会被更新，所以created_at的值基本上是像id字段一样递增的。

id是主键
created_at没有建立索引

今天我注意到了一个性能问题，当我需要查询最近5分钟内创建的记录时，我做了一些简单的事情，比如在我的PHP代码中写了where("created_at", ">=", now()->subMinutes(5))，不久后发现这个查询已经搜索了全部的450万条记录，并且非常慢。然而，这是有道理的，因为数据库引擎不知道在我的情况下created_at始终是递增的。

在盲目地向created_at添加索引来解决问题之前，我停下来考虑了一下，created_at的值如此多样化，所以给它添加索引似乎不太合适。最终，我采用了两个单独的查询。

SELECT id FROM activity_logs WHERE created_at < DATE_SUB(NOW(), INTERVAL 5 MINUTE) ORDER BY id DESC LIMIT 1
SELECT * FROM activity_logs WHERE id > [查询结果]

这个方法很有效，但我想知道处理这个问题的__标准__方法是什么，因为我认为这是一个非常常见的问题。我喜欢PHP/Laravel和ORM，没有过多关注SQL话题，希望能听到您的见解。

英文:

I have a MySQL table activity_logs which saves all API calls made by client apps since the very beginning, the app has been launched a couple of years, and this table now has 4.5 million records.

It worths to notice that each record will have an incremental ID, and also the created_at field won't be updated after the record is created, so the values of created_at is basically incremental in the same way as id field.

id is the primary key
created_at is not indexed

I noticed a performance issue today when I need to query the records created in last 5 minutes, I did something simple like where("created_at", ">=", now()->subMinutes(5)) in my PHP code, and shortly notice this query had searched all 4.5 million records and was terribly slow. However, it makes sense because database engine doesn't know that created_at is always incremental in my case.

I paused before blindly adding an index to created_at to fix it, the value of created_at is so diverse so adding an index to it doesn't sound to be right. I ended up with two separate queries.

SELECT id FROM activity_logs WHERE created_at < DATE_SUB(NOW(), INTERVAL 5 MINUTE) ORDER BY id DESC LIMIT 1
SELECT * FROM activity_logs WHERE id > [Query Result]

It works well, but I am wondering what's the standard way to deal with this issue, as I believe it is a very common issue. I am in love with PHP/Laravel and ORM and didn't follow up on SQL topics too much, and would like to see your insights.

答案1

得分: 1

添加INDEX(created_at)，然后使用

WHERE created_at > NOW() - INTERVAL 5 MINUTE
ORDER BY created_at, id

更多细节：

我假设（希望）你在那张表上有 ENGINE=InnoDB
如果你有 PRIMARY KEY(id)，那么它是基于 id 聚集的，MySQL 无法提供另一个聚集索引。
上面建议的二级索引在末尾默默地附加了 id。（这也是它如何找到行的方式。）
我建议的代码将会很快。
MySQL 没有 CLUSTERED 关键字。主键是聚集的，是唯一的，是一个索引。（MySQL 是 '精简而高效' 的；它没有像其他供应商那样多的花哨功能，因为它们不值得实施和维护。）
一个类似的技巧（但在这里并不是真正必要的）是

PRIMARY KEY(created_at, id),  -- 为了获得你想要的聚集效果
INDEX(id)    -- 为了让 AUTO_INCREMENT 开心

英文:

Add INDEX(created_at), then use

WHERE created_at &gt; NOW() - INTERVAL 5 MINUTE
ORDER BY created_at, id

More details:

I assume (hope) that you have ENGINE=InnoDB on that table
If you have PRIMARY KEY(id), then it is clustered on id and MySQL cannot provide another cluster index.
The secondary index suggested above has id silently tacked on the end. (That is also how it finds the row.)
The code I suggest will be fast.
MySQL has no CLUSTERED keyword. The PK is clustered and is Unique and is an index. (MySQL is 'lean an mean'; it does not have as many bells and whistles as other vendors because they aren't worth the effort to implement and maintain.)

A similar trick (but not really necessary here) is

  PRIMARY KEY(created_at, id),  -- To get the clustering you desire
  INDEX(id)    -- to keep AUTO_INCREMENT happy

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何获取在一个包含4百万条记录的表中，在过去的5分钟内创建的记录？

问题

答案1

How to use the MySQL server running on my local machine inside the docker container for a DropWizard application?

Unity LookAtMouse函数会影响移动性能

Golang代码在Hackerrank上运行太慢了。

我能改变mysql shell的名称吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论