2023年7月27日 20:00:06go评论113阅读模式

英文:

How can I optimize a MySQL JOIN BETWEEN statement with indexes?

问题

我有以下两个表格：

CREATE TABLE `ip_info` (
  `start_ip` int(10) unsigned NOT NULL,
  `end_ip` int(10) unsigned NOT NULL,
  `country_code` varchar(3) DEFAULT NULL,
  `country_name` varchar(255) DEFAULT NULL,
  `continent_code` varchar(3) DEFAULT NULL,
  `continent_name` varchar(255) DEFAULT NULL,
  `asn` int(10) unsigned DEFAULT NULL,
  `as_name` varchar(255) DEFAULT NULL,
  `as_domain` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`start_ip`,`end_ip`),
  KEY `country_code_idx` (`country_code`),
  KEY `asn_idx` (`asn`)
)

CREATE TABLE `servers` (
  `ipport` varchar(255) NOT NULL,
  `ip` varchar(255) NOT NULL,
  `port` int(11) NOT NULL,
  `ip_as_int` int(10) unsigned NOT NULL,
  `version` text NOT NULL,
  `protocol` int(11) NOT NULL,
  `online_count` int(11) NOT NULL,
  `max_count` int(11) NOT NULL,
  `description` text NOT NULL,
  `favicon` text DEFAULT NULL,
  `last_seen` int(11) NOT NULL,
  `cracked` tinyint(1) DEFAULT NULL,
  `joined_on` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`ip`,`port`),
  KEY `ip_as_int_idx` (`ip_as_int`)
)

我试图运行以下查询（获取美国至少有5人在线的服务器），但查询时间很长（18秒）：

SELECT count(*) FROM
    (SELECT * FROM servers AS d JOIN ip_info i ON d.ip_as_int BETWEEN i.start_ip AND i.end_ip WHERE i.country_code = "us") 
AS s WHERE (s.online_count > 5);

因此，我尝试解释查询，并获得以下结果：

+------+-------------+-------+------+--------------------------+------------------+---------+-------+---------+------------------------------------------------+
| id   | select_type | table | type | possible_keys            | key              | key_len | ref   | rows    | Extra                                          |
+------+-------------+-------+------+--------------------------+------------------+---------+-------+---------+------------------------------------------------+
|    1 | SIMPLE      | i     | ref  | PRIMARY,country_code_idx | country_code_idx | 15      | const |  761917 | Using where; Using index                       |
|    1 | SIMPLE      | d     | ALL  | ip_as_int_idx            | NULL             | NULL    | NULL  | 1035230 | Range checked for each record (index map: 0x2) |
+------+-------------+-------+------+--------------------------+------------------+---------+-------+---------+------------------------------------------------+

由于某种原因，ip_as_int_idx 没有被使用，我认为这是因为这个原因，MySQL 必须进行全表扫描。

我应该如何更改查询/索引以提高速度？

英文:

I have the following two tables:

CREATE TABLE `ip_info` (
  `start_ip` int(10) unsigned NOT NULL,
  `end_ip` int(10) unsigned NOT NULL,
  `country_code` varchar(3) DEFAULT NULL,
  `country_name` varchar(255) DEFAULT NULL,
  `continent_code` varchar(3) DEFAULT NULL,
  `continent_name` varchar(255) DEFAULT NULL,
  `asn` int(10) unsigned DEFAULT NULL,
  `as_name` varchar(255) DEFAULT NULL,
  `as_domain` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`start_ip`,`end_ip`),
  KEY `country_code_idx` (`country_code`),
  KEY `asn_idx` (`asn`)
)

CREATE TABLE `servers` (
  `ipport` varchar(255) NOT NULL,
  `ip` varchar(255) NOT NULL,
  `port` int(11) NOT NULL,
  `ip_as_int` int(10) unsigned NOT NULL,
  `version` text NOT NULL,
  `protocol` int(11) NOT NULL,
  `online_count` int(11) NOT NULL,
  `max_count` int(11) NOT NULL,
  `description` text NOT NULL,
  `favicon` text DEFAULT NULL,
  `last_seen` int(11) NOT NULL,
  `cracked` tinyint(1) DEFAULT NULL,
  `joined_on` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`ip`,`port`),
  KEY `ip_as_int_idx` (`ip_as_int`)
)

I try to run the following query (get servers from the US, with at least 5 people online), but it takes very long to complete (18s):

SELECT count(*) FROM
    (SELECT * FROM servers AS d JOIN ip_info i ON d.ip_as_int BETWEEN i.start_ip AND i.end_ip WHERE i.country_code = &quot;us&quot;) 
AS s WHERE (s.online_count &gt; 5) ;

Because of this, I tried explaining the query, and I get the following:

+------+-------------+-------+------+--------------------------+------------------+---------+-------+---------+------------------------------------------------+
| id   | select_type | table | type | possible_keys            | key              | key_len | ref   | rows    | Extra                                          |
+------+-------------+-------+------+--------------------------+------------------+---------+-------+---------+------------------------------------------------+
|    1 | SIMPLE      | i     | ref  | PRIMARY,country_code_idx | country_code_idx | 15      | const |  761917 | Using where; Using index                       |
|    1 | SIMPLE      | d     | ALL  | ip_as_int_idx            | NULL             | NULL    | NULL  | 1035230 | Range checked for each record (index map: 0x2) |
+------+-------------+-------+------+--------------------------+------------------+---------+-------+---------+------------------------------------------------+

For some reason the ip_as_int_idx isn't used, I suppose because of this MySQL has to do a full table scan.

How should I change the query/the index so it's faster?

答案1

得分: 1

我会重新表述查询如下：

SELECT count(*)
FROM servers d 
JOIN ip_info i ON d.ip_as_int BETWEEN i.start_ip AND i.end_ip 
WHERE i.country_code = 'us'
  AND d.online_count > 5;

修改后的查询可能能够充分利用以下索引：

CREATE INDEX ix1 ON ip_info (country_code, start_ip, end_ip);
CREATE INDEX ix2 ON servers (ip_as_int, online_count);

英文:

I would rephrase the query as:

SELECT count(*)
FROM servers d 
JOIN ip_info i ON d.ip_as_int BETWEEN i.start_ip AND i.end_ip 
WHERE i.country_code = &#39;us&#39;
  and d.online_count &gt; 5;

The modified query may be able to make good use of the following indexes:

create index ix1 on ip_info (country_code, start_ip, i.end_ip);
create index ix2 on servers (ip_as_int, online_count);

答案2

得分: 0

搜索IP并不容易。没有索引能够理解范围不重叠的情况。因此，具有（开始，结束）的索引在最好的情况下将扫描一半的表，因此具有O(N)的复杂性。

因此，我接受了这个挑战，并提出了https://mysql.rjweb.org/doc.php/ipranges

它确实需要将表结构化为只有开始而没有结束（或反之亦然）。并且需要处理范围之间的任何间隙。但它可以防止重叠，允许使用ORDER BY ... LIMIT 1以O(1)的复杂度找到答案。

另一件事 - 您必须在两个表中使用相同的IP地址格式。对于像'12.12.12.12' BETWEEN '111.3.4.5' AND '122.12.12.12'这样的比较，使用VARCHAR将给出错误的答案。对于IPv4，INT UNSIGNED有效，但对于IPv6则不适用。

英文:

Search by IP is not easy. And no index understands that the ranges are not overlapping. So, an index with (start, end) will, at best, scan half the table, hence O(N) complexity.

So, I took on the challenge and came up with https://mysql.rjweb.org/doc.php/ipranges

It does require structuring the table to only have start and not end (or vice versa). And it requires dealing with any gaps between ranges. But it prevents overlapping, allowing ORDER BY ... LIMIT 1 to find the answer with O(1).

Another thing -- You must use the same format for IP addresses in both tables. And VARCHAR will give the wrong answer for comparisons like '12.12.12.12' BETWEEN '111.3.4.5' AND '122.12.12.12'. INT UNSIGNED` works for IPv4, but not IPV6.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何优化带有索引的MySQL JOIN BETWEEN语句？

问题

答案1

答案2

SQL查询以获取从“Marks”表中每个学期中分数连续增加的学生详细信息。

使用JPA和Hibernate进行JoiningTables时出现了多重映射失败的问题。

如何在SQL中使用MS Access作为前端重新排列ROW_NUMBER()项？

索引用于OR查询，其中值在两列上相同。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。