英文:
mariadb create function returning random id
问题
我想创建一个从数据库表中返回随机行的数据库函数。
我有以下表格
create table category
(
id bigint not null primary key,
color varchar(255) null
);
然后我创建了一个带有以下代码的函数:
DELIMITER //
create or replace function random_category() returns int
begin
return (select cc.id from category cc order by rand() limit 1);
end //
DELIMITER ;
当调用 select random_category();
时,我总是得到一个单一的结果。
但是当调用以下查询时
select * from category c where id = random_category();
我会得到空结果、多行结果和单行结果。
我正在使用 10.11.3-MariaDB-1:10.11.3+maria~ubu2204
版本。
英文:
I want to create a database function returning a random row from a database table.
I have following table
create table category
(
id bigint not null primary key,
color varchar(255) null
);
and I create a function with:
DELIMITER //
create or replace function random_category() returns int
begin
return (select cc.id from category cc order by rand() limit 1);
end //
DELIMITER ;
When calling select random_category();
I always get a single result.
But when calling
select * from category c where id = random_category();
I receive empty result, multiple line results and single line results.
I am using 10.11.3-MariaDB-1:10.11.3+maria~ubu2204
答案1
得分: 3
这不是一个错误,这是预期行为。
让我们将您的函数移到一个子查询中,并使用一个包含几个值的序列:
# 尝试 1
select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
空集 (0.001 秒)
# 尝试 2
select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
+-----+
| seq |
+-----+
| 3 |
+-----+
1 行在集合中 (0.002 秒)
由于我们使用子查询而不是函数,EXPLAIN 将会稍微详细一些:
explain select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+
| 1 | PRIMARY | seq_1_to_3 | const | PRIMARY | PRIMARY | 8 | const | 1 | Using where; Using index |
| 2 | UNCACHEABLE SUBQUERY | seq_1_to_3 | index | NULL | PRIMARY | 8 | NULL | 3 | Using index; Using temporary; Using filesort |
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+
UNCACHEABLE SUBQUERY
表示子查询的结果不能存储在子查询缓存中,必须为每个比较执行。
假设在第一次尝试中,子查询返回了 3、1 和 1,在第二次尝试中,它返回了 2、1 和 3。在第一次尝试中没有匹配项(1 != 3, 2 != 1 和 3 != 1),而在第二次尝试中,3 匹配了 3。
另请参阅相关子查询(Wikipedia)。
为了避免这种情况,您可以将您的 SQL 语句更改为:
SELECT * from category ORDER BY RAND() LIMIT 1
然而,ORDER BY RAND()
非常慢,我建议您阅读 Rick James 的优秀文章 "从表中获取随机行"。
英文:
It is not a bug, it is expected behavior.
Let's move your function into a subquery and let's use a sequence with a few values:
# Attempt 1
select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
Empty set (0,001 sec)
# Attempt 2
select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
+-----+
| seq |
+-----+
| 3 |
+-----+
1 row in set (0,002 sec)
Since we use a subquery instead of a function, EXPLAIN will be a little bit more verbose:
explain select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+
| 1 | PRIMARY | seq_1_to_3 | const | PRIMARY | PRIMARY | 8 | const | 1 | Using where; Using index |
| 2 | UNCACHEABLE SUBQUERY | seq_1_to_3 | index | NULL | PRIMARY | 8 | NULL | 3 | Using index; Using temporary; Using filesort |
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+
UNCACHEABLE SUBQUERY
means, that the result of the subquery cannot be stored in subquery cache and must be executed for each comparison.
Let's assume in first attempt the subquery returned 3,1 and 1, in second it returned 2,1 and 3. In first attempt there was no matching ( 1 != 3, 2 != 1 and 3 != 1), while in 2nd attempt 3 matched 3.
See also Correlated Subqueries (Wikipedia).
To avoid this, you could just change your SQL statement to
SELECT * from category ORDER BY RAND() LIMIT 1
However ORDER BY RAND()
is very slow, I would suggest you to read Rick James' excellent article "Fetching random rows from a table".
答案2
得分: 1
Your function is called for each separate row independently. Each call generates a new single id
value. So the amount of output rows vary.
You must call the function once. For example, with
select category.*
from category
JOIN (SELECT random_category() AS id) AS criteria USING (id);
Also, you may try to define your function as DETERMINISTIC:
create or replace function random_category() returns int DETERMINISTIC
begin
return (select cc.id from category cc order by rand() limit 1);
end
AFAIR in this case the function output is treated as a constant (it has no arguments), and it should be called once.. but I'm not sure.
英文:
Your function is called for each separate row independently. Each call generates new single id
value. So the amount of output rows vary.
You must call the function once. For example, with
select category.*
from category
JOIN (SELECT random_category() AS id) AS criteria USING (id);
Also you may try to define your function as DETERMINISTIC:
create or replace function random_category() returns int DETERMINISTIC
begin
return (select cc.id from category cc order by rand() limit 1);
end
AFAIR in this case the function output is treated as a constant (it have no arguments), and it should be called once.. but I'm not sure.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论