如何优化带有索引的MySQL JOIN BETWEEN语句?

huangapple go评论70阅读模式
英文:

How can I optimize a MySQL JOIN BETWEEN statement with indexes?

问题

我有以下两个表格:

CREATE TABLE `ip_info` (
  `start_ip` int(10) unsigned NOT NULL,
  `end_ip` int(10) unsigned NOT NULL,
  `country_code` varchar(3) DEFAULT NULL,
  `country_name` varchar(255) DEFAULT NULL,
  `continent_code` varchar(3) DEFAULT NULL,
  `continent_name` varchar(255) DEFAULT NULL,
  `asn` int(10) unsigned DEFAULT NULL,
  `as_name` varchar(255) DEFAULT NULL,
  `as_domain` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`start_ip`,`end_ip`),
  KEY `country_code_idx` (`country_code`),
  KEY `asn_idx` (`asn`)
)
CREATE TABLE `servers` (
  `ipport` varchar(255) NOT NULL,
  `ip` varchar(255) NOT NULL,
  `port` int(11) NOT NULL,
  `ip_as_int` int(10) unsigned NOT NULL,
  `version` text NOT NULL,
  `protocol` int(11) NOT NULL,
  `online_count` int(11) NOT NULL,
  `max_count` int(11) NOT NULL,
  `description` text NOT NULL,
  `favicon` text DEFAULT NULL,
  `last_seen` int(11) NOT NULL,
  `cracked` tinyint(1) DEFAULT NULL,
  `joined_on` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`ip`,`port`),
  KEY `ip_as_int_idx` (`ip_as_int`)
)

我试图运行以下查询(获取美国至少有5人在线的服务器),但查询时间很长(18秒):

SELECT count(*) FROM
    (SELECT * FROM servers AS d JOIN ip_info i ON d.ip_as_int BETWEEN i.start_ip AND i.end_ip WHERE i.country_code = "us") 
AS s WHERE (s.online_count > 5);

因此,我尝试解释查询,并获得以下结果:

+------+-------------+-------+------+--------------------------+------------------+---------+-------+---------+------------------------------------------------+
| id   | select_type | table | type | possible_keys            | key              | key_len | ref   | rows    | Extra                                          |
+------+-------------+-------+------+--------------------------+------------------+---------+-------+---------+------------------------------------------------+
|    1 | SIMPLE      | i     | ref  | PRIMARY,country_code_idx | country_code_idx | 15      | const |  761917 | Using where; Using index                       |
|    1 | SIMPLE      | d     | ALL  | ip_as_int_idx            | NULL             | NULL    | NULL  | 1035230 | Range checked for each record (index map: 0x2) |
+------+-------------+-------+------+--------------------------+------------------+---------+-------+---------+------------------------------------------------+

由于某种原因,ip_as_int_idx 没有被使用,我认为这是因为这个原因,MySQL 必须进行全表扫描。

我应该如何更改查询/索引以提高速度?

英文:

I have the following two tables:

CREATE TABLE `ip_info` (
  `start_ip` int(10) unsigned NOT NULL,
  `end_ip` int(10) unsigned NOT NULL,
  `country_code` varchar(3) DEFAULT NULL,
  `country_name` varchar(255) DEFAULT NULL,
  `continent_code` varchar(3) DEFAULT NULL,
  `continent_name` varchar(255) DEFAULT NULL,
  `asn` int(10) unsigned DEFAULT NULL,
  `as_name` varchar(255) DEFAULT NULL,
  `as_domain` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`start_ip`,`end_ip`),
  KEY `country_code_idx` (`country_code`),
  KEY `asn_idx` (`asn`)
)
CREATE TABLE `servers` (
  `ipport` varchar(255) NOT NULL,
  `ip` varchar(255) NOT NULL,
  `port` int(11) NOT NULL,
  `ip_as_int` int(10) unsigned NOT NULL,
  `version` text NOT NULL,
  `protocol` int(11) NOT NULL,
  `online_count` int(11) NOT NULL,
  `max_count` int(11) NOT NULL,
  `description` text NOT NULL,
  `favicon` text DEFAULT NULL,
  `last_seen` int(11) NOT NULL,
  `cracked` tinyint(1) DEFAULT NULL,
  `joined_on` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`ip`,`port`),
  KEY `ip_as_int_idx` (`ip_as_int`)
)

I try to run the following query (get servers from the US, with at least 5 people online), but it takes very long to complete (18s):

SELECT count(*) FROM
    (SELECT * FROM servers AS d JOIN ip_info i ON d.ip_as_int BETWEEN i.start_ip AND i.end_ip WHERE i.country_code = "us") 
AS s WHERE (s.online_count > 5) ;

Because of this, I tried explaining the query, and I get the following:

+------+-------------+-------+------+--------------------------+------------------+---------+-------+---------+------------------------------------------------+
| id   | select_type | table | type | possible_keys            | key              | key_len | ref   | rows    | Extra                                          |
+------+-------------+-------+------+--------------------------+------------------+---------+-------+---------+------------------------------------------------+
|    1 | SIMPLE      | i     | ref  | PRIMARY,country_code_idx | country_code_idx | 15      | const |  761917 | Using where; Using index                       |
|    1 | SIMPLE      | d     | ALL  | ip_as_int_idx            | NULL             | NULL    | NULL  | 1035230 | Range checked for each record (index map: 0x2) |
+------+-------------+-------+------+--------------------------+------------------+---------+-------+---------+------------------------------------------------+

For some reason the ip_as_int_idx isn't used, I suppose because of this MySQL has to do a full table scan.

How should I change the query/the index so it's faster?

答案1

得分: 1

我会重新表述查询如下:

SELECT count(*)
FROM servers d 
JOIN ip_info i ON d.ip_as_int BETWEEN i.start_ip AND i.end_ip 
WHERE i.country_code = 'us'
  AND d.online_count > 5;

修改后的查询可能能够充分利用以下索引:

CREATE INDEX ix1 ON ip_info (country_code, start_ip, end_ip);
CREATE INDEX ix2 ON servers (ip_as_int, online_count);
英文:

I would rephrase the query as:

SELECT count(*)
FROM servers d 
JOIN ip_info i ON d.ip_as_int BETWEEN i.start_ip AND i.end_ip 
WHERE i.country_code = 'us'
  and d.online_count > 5;

The modified query may be able to make good use of the following indexes:

create index ix1 on ip_info (country_code, start_ip, i.end_ip);

create index ix2 on servers (ip_as_int, online_count);

答案2

得分: 0

搜索IP并不容易。没有索引能够理解范围不重叠的情况。因此,具有(开始,结束)的索引在最好的情况下将扫描一半的表,因此具有O(N)的复杂性。

因此,我接受了这个挑战,并提出了https://mysql.rjweb.org/doc.php/ipranges

它确实需要将表结构化为只有开始而没有结束(或反之亦然)。并且需要处理范围之间的任何间隙。但它可以防止重叠,允许使用ORDER BY ... LIMIT 1以O(1)的复杂度找到答案。

另一件事 - 您必须在两个表中使用相同的IP地址格式。对于像'12.12.12.12' BETWEEN '111.3.4.5' AND '122.12.12.12'这样的比较,使用VARCHAR将给出错误的答案。对于IPv4,INT UNSIGNED有效,但对于IPv6则不适用。

英文:

Search by IP is not easy. And no index understands that the ranges are not overlapping. So, an index with (start, end) will, at best, scan half the table, hence O(N) complexity.

So, I took on the challenge and came up with https://mysql.rjweb.org/doc.php/ipranges

It does require structuring the table to only have start and not end (or vice versa). And it requires dealing with any gaps between ranges. But it prevents overlapping, allowing ORDER BY ... LIMIT 1 to find the answer with O(1).

Another thing -- You must use the same format for IP addresses in both tables. And VARCHAR will give the wrong answer for comparisons like '12.12.12.12' BETWEEN '111.3.4.5' AND '122.12.12.12'. INT UNSIGNED` works for IPv4, but not IPV6.

huangapple
  • 本文由 发表于 2023年7月27日 20:00:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76779523.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定