MySql – 添加列以选择会降低性能

huangapple go评论54阅读模式
英文:

MySql - Adding columns to select degrades performance

问题

我有一个包含数百万行数据的表格。我在id上有一个主键,以及在col2、col3、col4和my_date上有一个复合唯一键(称为comp_indx)。这里显示了示例数据...

如果我执行以下查询...

查询非常高效,并且运行explain命令显示...

然而,如果我运行类似的命令(只是请求更多列 - 其中没有一个列在索引中),例如...

然后性能就会大幅下降,如果我再次运行explain命令,我会得到...

我可以看到类型已从范围(range)变为索引(index),而且索引不再用于group-by。

我正在尝试理解为什么会发生这种情况,更重要的是,如何解决这个问题?

BTW,表的定义是...

CREATE TABLE my_table (
id int(11) NOT NULL AUTO_INCREMENT,
col2 smallint(6) NOT NULL,
col3 smallint(6) NOT NULL,
col4 smallint(6) NOT NULL,
my_date datetime NOT NULL,
col5 char(1) NOT NULL,
col6 char(1) NOT NULL,
col7 char(1) NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY comp_indx (col2,col3,col4,my_date)
) ENGINE=InnoDB;

英文:

I have a table with several million rows of data. I have a primary key on id, and a compound unique key on col2, col3, col4 and my_date (called comp_indx). Sample data is shown here...

id   col2 col3 col4 my_date             col5 col6 col7
1    1    1    1    2020-01-03 02:00:00 a    1    a
2    1    2    1    2020-01-03 01:00:00 b    2    1
3    1    3    1    2020-01-03 03:00:00 c    3    b
4    2    1    1    2020-02-03 01:00:00 d    4    2
5    2    2    1    2020-02-03 02:00:00 e    5    c
6    2    3    1    2020-02-03 03:00:00 f    6    3
7    3    1    1    2020-03-03 03:00:00 g    7    d
8    3    2    1    2020-03-03 02:00:00 h    8    4
9    3    3    1    2020-03-03 01:00:00 i    9    e

If I perform the following query...

SELECT col2, col3, max(my_date)
FROM table
where col4=1 and my_date <= '2001-01-27'
group by col2, col3

...the query is very efficient, and running the explain command shows...

select_type type  key       key_len rows Extra
----------- ----- --------- ------- ---- -------------------------------------
SIMPLE      range comp_indx 11      669  Using where; Using index for group-by

However, if I run a similar command (only requesting more columns - none of which are part of an index), e.g...

SELECT col2, col3, max(my_date), col5, col7
FROM table
where col4=1 and my_date <= '2001-01-27'
group by col2, col3

...then the performance drops right off, and if I run the explain command again, I get...

select_type type  key       key_len rows     Extra
----------- ----- --------- ------- -------  -----------
SIMPLE      index comp_indx 11      5004953  Using where

I can see that the type has changed from range to index, and I can see that the index is no longer being used for the group-by.

I am trying to understand why this is happening, and more importantly, how can I fix this issue?

BTW the table definition is...

CREATE TABLE `my_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `col2` smallint(6) NOT NULL,
  `col3` smallint(6) NOT NULL,
  `col4` smallint(6) NOT NULL,
  `my_date` datetime NOT NULL,
  `col5` char(1) NOT NULL,
  `col6` char(1) NOT NULL,
  `col7` char(1) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `comp_indx` (`col2`,`col3`,`col4`,`my_date`)
) ENGINE=InnoDB;

答案1

得分: 1

请看以下翻译:

添加以下索引

alter table my_table add key cl4_dt_cl2_cl3 (col4,my_date,col2,col3);

此外,如果启用了 sql_mode 的 only_full_group_by,以下查询无效

SELECT col2, col3, max(my_date), col5, col7
FROM table
where col4=1 and my_date <= '2001-01-27'
group by col2, col3
英文:

Add the following index

alter table my_table add key cl4_dt_cl2_cl3 (col4,my_date,col2,col3);

Moreover , the following query is invalid if sql_mode only_full_group_by is enabled

SELECT col2, col3, max(my_date), col5, col7
FROM table
where col4=1 and my_date <= '2001-01-27'
group by col2, col3

答案2

得分: 1

col5和col7也应该添加到GROUP BY子句中,对吗?

英文:

Your original query:

SELECT col2, col3, max(my_date), col5, col7
FROM table
where col4=1 and my_date <= '2001-01-27'
group by col2, col3

col5, col7 should be added to the group by clause as well right?

答案3

得分: 1

如果您不需要 id 字段用于任何目的,那么这将加速查询,无论您需要获取额外的列(col5/6/7)。

CREATE TABLE `my_table` (
  `col2` smallint(6) NOT NULL,
  `col3` smallint(6) NOT NULL,
  `col4` smallint(6) NOT NULL,
  `my_date` datetime NOT NULL,
  `col5` char(1) NOT NULL,
  `col6` char(1) NOT NULL,
  `col7` char(1) NOT NULL,
  PRIMARY KEY (col4, my_date, col2, col3)  -- 按此顺序
) ENGINE=InnoDB;

如果您因为其他表引用了 id 字段而需要它,那么添加以下内容:

`id` int(11) NOT NULL AUTO_INCREMENT,
INDEX(id)  -- 这足以保持 auto_inc 的顺序

我建议的主键(PK)占用11字节(与4字节的INT相比)。任何辅助索引都会包括这11个字节。然而,主键和辅助索引之间共有的任何列都不会重复。例如,INDEX(col2, col7) 实际上将是 INDEX(col2, col7, col4, my_date, col3)

请记住,主键确定行的“参考局部性”。任何以 col4 开头的辅助索引几乎是无用的,因为主键从那里开始。当然,这取决于基数等因素。

英文:

If you don't need id for anything, then this this will speed up the query, regardless of the extra columns (col5/6/7) that you need to fetch.

CREATE TABLE `my_table` (
  `col2` smallint(6) NOT NULL,
  `col3` smallint(6) NOT NULL,
  `col4` smallint(6) NOT NULL,
  `my_date` datetime NOT NULL,
  `col5` char(1) NOT NULL,
  `col6` char(1) NOT NULL,
  `col7` char(1) NOT NULL,
  PRIMARY KEY (col4,my_date,col2,col3)  -- in this order
) ENGINE=InnoDB;

If you do need id because of it being referenced from other table(s), then add

  `id` int(11) NOT NULL AUTO_INCREMENT,
  INDEX(id)  -- This is sufficient to keep auto_inc happy

My suggested PK is 11 bytes (vs 4-byte INT). Any secondary will include those 11 bytes. However, any columns that are common between the PK and the secondary index won't be repeated. For example INDEX(col2, col7) will be effectively INDEX(col2, col7, col4, my_date, col3).

Keep in mind that the PK determines the "locality of reference" of the rows. Any secondary index starting with col4 will be almost useless since the PK starts with that. (This, of course, depends on cardinality, etc, etc.)

答案4

得分: 0

I have now fixed my performance issue by using 2 selects on the same table and a join, e.g...

SELECT *
FROM (
SELECT col2, col3, max(my_date) as max_date
FROM table
where col4=1 and my_date <= '2001-01-27'
group by col2, col3
) aaa
join
(
SELECT col2, col3, my_date, col5, col6, col7
FROM table
where col4=1
) bbb
on (aaa.col2=bbb.col2 and aaa.col3=bbb.col3 and aaa.max_date=bbb.my_date);

英文:

I have now fixed my performance issue by using 2 selects on the same table and a join, e.g...

SELECT *
FROM (
	SELECT col2, col3, max(my_date) as max_date
	FROM table
	where col4=1 and my_date &lt;= &#39;2001-01-27&#39;
	group by col2, col3
) aaa
join
(
	SELECT col2, col3, my_date, col5, col6, col7
	FROM table
	where col4=1
) bbb
on (aaa.col2=bbb.col2 and aaa.col3=bbb.col3 and aaa.max_date=bbb.my_date);

答案5

得分: 0

创建索引以加速第二个查询:

在我的表上创建一个覆盖索引,包括 col2、col3、col4、my_date、col5 和 col7。

create index comp2_index on my_table(col2, col3, col4, my_date, col5, col7);
英文:

You probably need to add this covering index to make the second query faster :

create index comp2_index on my_table(col2, col3, col4, my_date, col5, col7);

huangapple
  • 本文由 发表于 2023年5月6日 20:48:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76188981.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定