创建返回随机 ID 的 MariaDB 函数。

huangapple go评论62阅读模式
英文:

mariadb create function returning random id

问题

我想创建一个从数据库表中返回随机行的数据库函数。

我有以下表格

create table category
(
    id    bigint       not null primary key,
    color varchar(255) null
);

然后我创建了一个带有以下代码的函数:

DELIMITER //
create or replace function random_category() returns int
begin
    return (select cc.id from category cc order by rand() limit 1);
end //
DELIMITER ;

当调用 select random_category(); 时,我总是得到一个单一的结果。

但是当调用以下查询时

select * from category c where id = random_category();

我会得到空结果、多行结果和单行结果。

我正在使用 10.11.3-MariaDB-1:10.11.3+maria~ubu2204 版本。

英文:

I want to create a database function returning a random row from a database table.

I have following table

create table category
(
    id    bigint       not null primary key,
    color varchar(255) null
);

and I create a function with:

DELIMITER //
create or replace function random_category() returns int
begin
    return (select cc.id from category cc order by rand() limit 1);
end //
DELIMITER ;

When calling select random_category(); I always get a single result.

But when calling

select * from category c where id = random_category();

I receive empty result, multiple line results and single line results.

I am using 10.11.3-MariaDB-1:10.11.3+maria~ubu2204

答案1

得分: 3

这不是一个错误,这是预期行为。

让我们将您的函数移到一个子查询中,并使用一个包含几个值的序列:

# 尝试 1
select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
空集 (0.001 秒)

# 尝试 2
select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
+-----+
| seq |
+-----+
|   3 |
+-----+
1 行在集合中 (0.002 秒)

由于我们使用子查询而不是函数,EXPLAIN 将会稍微详细一些:

explain select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+
| id   | select_type          | table      | type  | possible_keys | key     | key_len | ref   | rows | Extra                                        |
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+
|    1 | PRIMARY              | seq_1_to_3 | const | PRIMARY       | PRIMARY | 8       | const | 1    | Using where; Using index                     |
|    2 | UNCACHEABLE SUBQUERY | seq_1_to_3 | index | NULL          | PRIMARY | 8       | NULL  | 3    | Using index; Using temporary; Using filesort |
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+

UNCACHEABLE SUBQUERY 表示子查询的结果不能存储在子查询缓存中,必须为每个比较执行。

假设在第一次尝试中,子查询返回了 3、1 和 1,在第二次尝试中,它返回了 2、1 和 3。在第一次尝试中没有匹配项(1 != 3, 2 != 1 和 3 != 1),而在第二次尝试中,3 匹配了 3。

另请参阅相关子查询(Wikipedia)。

为了避免这种情况,您可以将您的 SQL 语句更改为:

SELECT * from category ORDER BY RAND() LIMIT 1

然而,ORDER BY RAND() 非常慢,我建议您阅读 Rick James 的优秀文章 "从表中获取随机行"。

英文:

It is not a bug, it is expected behavior.

Let's move your function into a subquery and let's use a sequence with a few values:

# Attempt 1
select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
Empty set (0,001 sec)

# Attempt 2
select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
+-----+
| seq |
+-----+
|   3 |
+-----+
1 row in set (0,002 sec)

Since we use a subquery instead of a function, EXPLAIN will be a little bit more verbose:

explain select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+
| id   | select_type          | table      | type  | possible_keys | key     | key_len | ref   | rows | Extra                                        |
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+
|    1 | PRIMARY              | seq_1_to_3 | const | PRIMARY       | PRIMARY | 8       | const | 1    | Using where; Using index                     |
|    2 | UNCACHEABLE SUBQUERY | seq_1_to_3 | index | NULL          | PRIMARY | 8       | NULL  | 3    | Using index; Using temporary; Using filesort |
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+

UNCACHEABLE SUBQUERY means, that the result of the subquery cannot be stored in subquery cache and must be executed for each comparison.

Let's assume in first attempt the subquery returned 3,1 and 1, in second it returned 2,1 and 3. In first attempt there was no matching ( 1 != 3, 2 != 1 and 3 != 1), while in 2nd attempt 3 matched 3.

See also Correlated Subqueries (Wikipedia).

To avoid this, you could just change your SQL statement to

SELECT * from category ORDER BY RAND() LIMIT 1

However ORDER BY RAND() is very slow, I would suggest you to read Rick James' excellent article "Fetching random rows from a table".

答案2

得分: 1

Your function is called for each separate row independently. Each call generates a new single id value. So the amount of output rows vary.

You must call the function once. For example, with

select category.* 
from category 
JOIN (SELECT random_category() AS id) AS criteria USING (id);

Also, you may try to define your function as DETERMINISTIC:

create or replace function random_category() returns int DETERMINISTIC
begin
    return (select cc.id from category cc order by rand() limit 1);
end

AFAIR in this case the function output is treated as a constant (it has no arguments), and it should be called once.. but I'm not sure.

英文:

Your function is called for each separate row independently. Each call generates new single id value. So the amount of output rows vary.

You must call the function once. For example, with

select category.* 
from category 
JOIN (SELECT random_category() AS id) AS criteria USING (id);

Also you may try to define your function as DETERMINISTIC:

create or replace function random_category() returns int DETERMINISTIC
begin
    return (select cc.id from category cc order by rand() limit 1);
end

AFAIR in this case the function output is treated as a constant (it have no arguments), and it should be called once.. but I'm not sure.

huangapple
  • 本文由 发表于 2023年6月8日 19:39:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76431483.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定