2023年5月6日 20:48:57go评论54阅读模式

英文:

MySql - Adding columns to select degrades performance

问题

我有一个包含数百万行数据的表格。我在id上有一个主键，以及在col2、col3、col4和my_date上有一个复合唯一键（称为comp_indx）。这里显示了示例数据...

如果我执行以下查询...

查询非常高效，并且运行explain命令显示...

然而，如果我运行类似的命令（只是请求更多列 - 其中没有一个列在索引中），例如...

然后性能就会大幅下降，如果我再次运行explain命令，我会得到...

我可以看到类型已从范围（range）变为索引（index），而且索引不再用于group-by。

我正在尝试理解为什么会发生这种情况，更重要的是，如何解决这个问题？

BTW，表的定义是...

CREATE TABLE my_table (
id int(11) NOT NULL AUTO_INCREMENT,
col2 smallint(6) NOT NULL,
col3 smallint(6) NOT NULL,
col4 smallint(6) NOT NULL,
my_date datetime NOT NULL,
col5 char(1) NOT NULL,
col6 char(1) NOT NULL,
col7 char(1) NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY comp_indx (col2,col3,col4,my_date)
) ENGINE=InnoDB;

英文:

I have a table with several million rows of data. I have a primary key on id, and a compound unique key on col2, col3, col4 and my_date (called comp_indx). Sample data is shown here...

id   col2 col3 col4 my_date             col5 col6 col7
1    1    1    1    2020-01-03 02:00:00 a    1    a
2    1    2    1    2020-01-03 01:00:00 b    2    1
3    1    3    1    2020-01-03 03:00:00 c    3    b
4    2    1    1    2020-02-03 01:00:00 d    4    2
5    2    2    1    2020-02-03 02:00:00 e    5    c
6    2    3    1    2020-02-03 03:00:00 f    6    3
7    3    1    1    2020-03-03 03:00:00 g    7    d
8    3    2    1    2020-03-03 02:00:00 h    8    4
9    3    3    1    2020-03-03 01:00:00 i    9    e

If I perform the following query...

SELECT col2, col3, max(my_date)
FROM table
where col4=1 and my_date &lt;= &#39;2001-01-27&#39;
group by col2, col3

...the query is very efficient, and running the explain command shows...

select_type type  key       key_len rows Extra
----------- ----- --------- ------- ---- -------------------------------------
SIMPLE      range comp_indx 11      669  Using where; Using index for group-by

However, if I run a similar command (only requesting more columns - none of which are part of an index), e.g...

SELECT col2, col3, max(my_date), col5, col7
FROM table
where col4=1 and my_date &lt;= &#39;2001-01-27&#39;
group by col2, col3

...then the performance drops right off, and if I run the explain command again, I get...

select_type type  key       key_len rows     Extra
----------- ----- --------- ------- -------  -----------
SIMPLE      index comp_indx 11      5004953  Using where

I can see that the type has changed from range to index, and I can see that the index is no longer being used for the group-by.

I am trying to understand why this is happening, and more importantly, how can I fix this issue?

BTW the table definition is...

CREATE TABLE `my_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `col2` smallint(6) NOT NULL,
  `col3` smallint(6) NOT NULL,
  `col4` smallint(6) NOT NULL,
  `my_date` datetime NOT NULL,
  `col5` char(1) NOT NULL,
  `col6` char(1) NOT NULL,
  `col7` char(1) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `comp_indx` (`col2`,`col3`,`col4`,`my_date`)
) ENGINE=InnoDB;

答案1

得分: 1

请看以下翻译：

添加以下索引

alter table my_table add key cl4_dt_cl2_cl3 (col4,my_date,col2,col3);

此外，如果启用了 sql_mode 的 only_full_group_by，以下查询无效

SELECT col2, col3, max(my_date), col5, col7
FROM table
where col4=1 and my_date &lt;= &#39;2001-01-27&#39;
group by col2, col3

英文:

Add the following index

alter table my_table add key cl4_dt_cl2_cl3 (col4,my_date,col2,col3);

Moreover , the following query is invalid if sql_mode only_full_group_by is enabled

SELECT col2, col3, max(my_date), col5, col7
FROM table
where col4=1 and my_date &lt;= &#39;2001-01-27&#39;
group by col2, col3

答案2

得分: 1

col5和col7也应该添加到GROUP BY子句中，对吗？

英文:

Your original query:

SELECT col2, col3, max(my_date), col5, col7
FROM table
where col4=1 and my_date &lt;= &#39;2001-01-27&#39;
group by col2, col3

col5, col7 should be added to the group by clause as well right?

答案3

得分: 1

如果您不需要 id 字段用于任何目的，那么这将加速查询，无论您需要获取额外的列（col5/6/7）。

CREATE TABLE `my_table` (
  `col2` smallint(6) NOT NULL,
  `col3` smallint(6) NOT NULL,
  `col4` smallint(6) NOT NULL,
  `my_date` datetime NOT NULL,
  `col5` char(1) NOT NULL,
  `col6` char(1) NOT NULL,
  `col7` char(1) NOT NULL,
  PRIMARY KEY (col4, my_date, col2, col3)  -- 按此顺序
) ENGINE=InnoDB;

如果您因为其他表引用了 id 字段而需要它，那么添加以下内容：

`id` int(11) NOT NULL AUTO_INCREMENT,
INDEX(id)  -- 这足以保持 auto_inc 的顺序

我建议的主键（PK）占用11字节（与4字节的INT相比）。任何辅助索引都会包括这11个字节。然而，主键和辅助索引之间共有的任何列都不会重复。例如，INDEX(col2, col7) 实际上将是 INDEX(col2, col7, col4, my_date, col3)。

请记住，主键确定行的“参考局部性”。任何以 col4 开头的辅助索引几乎是无用的，因为主键从那里开始。当然，这取决于基数等因素。

英文:

If you don't need id for anything, then this this will speed up the query, regardless of the extra columns (col5/6/7) that you need to fetch.

CREATE TABLE `my_table` (
  `col2` smallint(6) NOT NULL,
  `col3` smallint(6) NOT NULL,
  `col4` smallint(6) NOT NULL,
  `my_date` datetime NOT NULL,
  `col5` char(1) NOT NULL,
  `col6` char(1) NOT NULL,
  `col7` char(1) NOT NULL,
  PRIMARY KEY (col4,my_date,col2,col3)  -- in this order
) ENGINE=InnoDB;

If you do need id because of it being referenced from other table(s), then add

  `id` int(11) NOT NULL AUTO_INCREMENT,
  INDEX(id)  -- This is sufficient to keep auto_inc happy

My suggested PK is 11 bytes (vs 4-byte INT). Any secondary will include those 11 bytes. However, any columns that are common between the PK and the secondary index won't be repeated. For example INDEX(col2, col7) will be effectively INDEX(col2, col7, col4, my_date, col3).

Keep in mind that the PK determines the "locality of reference" of the rows. Any secondary index starting with col4 will be almost useless since the PK starts with that. (This, of course, depends on cardinality, etc, etc.)

答案4

得分: 0

I have now fixed my performance issue by using 2 selects on the same table and a join, e.g...

SELECT *
FROM (
SELECT col2, col3, max(my_date) as max_date
FROM table
where col4=1 and my_date <= '2001-01-27'
group by col2, col3
) aaa
join
(
SELECT col2, col3, my_date, col5, col6, col7
FROM table
where col4=1
) bbb
on (aaa.col2=bbb.col2 and aaa.col3=bbb.col3 and aaa.max_date=bbb.my_date);

英文:

I have now fixed my performance issue by using 2 selects on the same table and a join, e.g...

SELECT *
FROM (
	SELECT col2, col3, max(my_date) as max_date
	FROM table
	where col4=1 and my_date &lt;= &#39;2001-01-27&#39;
	group by col2, col3
) aaa
join
(
	SELECT col2, col3, my_date, col5, col6, col7
	FROM table
	where col4=1
) bbb
on (aaa.col2=bbb.col2 and aaa.col3=bbb.col3 and aaa.max_date=bbb.my_date);

答案5

得分: 0

创建索引以加速第二个查询：

在我的表上创建一个覆盖索引，包括 col2、col3、col4、my_date、col5 和 col7。

create index comp2_index on my_table(col2, col3, col4, my_date, col5, col7);

英文:

You probably need to add this covering index to make the second query faster :

create index comp2_index on my_table(col2, col3, col4, my_date, col5, col7);

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

MySql – 添加列以选择会降低性能

问题

答案1

答案2

答案3

答案4

答案5

如何在MySQL和Go中获取最后插入行的ID？

将Go中的JSON转换为结构体时出现错误。

将时间范围分为2分钟的间隔。

发送多个HTTP请求的最快方法

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论