2023年6月8日 19:39:13go评论62阅读模式

英文:

mariadb create function returning random id

问题

我想创建一个从数据库表中返回随机行的数据库函数。

我有以下表格

create table category
(
    id    bigint       not null primary key,
    color varchar(255) null
);

然后我创建了一个带有以下代码的函数：

DELIMITER //
create or replace function random_category() returns int
begin
    return (select cc.id from category cc order by rand() limit 1);
end //
DELIMITER ;

当调用 select random_category(); 时，我总是得到一个单一的结果。

但是当调用以下查询时

select * from category c where id = random_category();

我会得到空结果、多行结果和单行结果。

我正在使用 10.11.3-MariaDB-1:10.11.3+maria~ubu2204 版本。

英文:

I want to create a database function returning a random row from a database table.

I have following table

create table category
(
    id    bigint       not null primary key,
    color varchar(255) null
);

and I create a function with:

DELIMITER //
create or replace function random_category() returns int
begin
    return (select cc.id from category cc order by rand() limit 1);
end //
DELIMITER ;

When calling select random_category(); I always get a single result.

But when calling

select * from category c where id = random_category();

I receive empty result, multiple line results and single line results.

I am using 10.11.3-MariaDB-1:10.11.3+maria~ubu2204

答案1

得分: 3

这不是一个错误，这是预期行为。

让我们将您的函数移到一个子查询中，并使用一个包含几个值的序列：

# 尝试 1
select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
空集 (0.001 秒)

# 尝试 2
select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
+-----+
| seq |
+-----+
|   3 |
+-----+
1 行在集合中 (0.002 秒)

由于我们使用子查询而不是函数，EXPLAIN 将会稍微详细一些：

explain select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+
| id   | select_type          | table      | type  | possible_keys | key     | key_len | ref   | rows | Extra                                        |
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+
|    1 | PRIMARY              | seq_1_to_3 | const | PRIMARY       | PRIMARY | 8       | const | 1    | Using where; Using index                     |
|    2 | UNCACHEABLE SUBQUERY | seq_1_to_3 | index | NULL          | PRIMARY | 8       | NULL  | 3    | Using index; Using temporary; Using filesort |
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+

UNCACHEABLE SUBQUERY 表示子查询的结果不能存储在子查询缓存中，必须为每个比较执行。

假设在第一次尝试中，子查询返回了 3、1 和 1，在第二次尝试中，它返回了 2、1 和 3。在第一次尝试中没有匹配项（1 != 3, 2 != 1 和 3 != 1），而在第二次尝试中，3 匹配了 3。

另请参阅相关子查询（Wikipedia）。

为了避免这种情况，您可以将您的 SQL 语句更改为：

SELECT * from category ORDER BY RAND() LIMIT 1

然而，ORDER BY RAND() 非常慢，我建议您阅读 Rick James 的优秀文章 "从表中获取随机行"。

英文:

It is not a bug, it is expected behavior.

Let's move your function into a subquery and let's use a sequence with a few values:

# Attempt 1
select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
Empty set (0,001 sec)

# Attempt 2
select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
+-----+
| seq |
+-----+
|   3 |
+-----+
1 row in set (0,002 sec)

Since we use a subquery instead of a function, EXPLAIN will be a little bit more verbose:

explain select seq from seq_1_to_3 where seq=(select seq from seq_1_to_3 order by rand() limit 1);
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+
| id   | select_type          | table      | type  | possible_keys | key     | key_len | ref   | rows | Extra                                        |
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+
|    1 | PRIMARY              | seq_1_to_3 | const | PRIMARY       | PRIMARY | 8       | const | 1    | Using where; Using index                     |
|    2 | UNCACHEABLE SUBQUERY | seq_1_to_3 | index | NULL          | PRIMARY | 8       | NULL  | 3    | Using index; Using temporary; Using filesort |
+------+----------------------+------------+-------+---------------+---------+---------+-------+------+----------------------------------------------+

UNCACHEABLE SUBQUERY means, that the result of the subquery cannot be stored in subquery cache and must be executed for each comparison.

Let's assume in first attempt the subquery returned 3,1 and 1, in second it returned 2,1 and 3. In first attempt there was no matching ( 1 != 3, 2 != 1 and 3 != 1), while in 2nd attempt 3 matched 3.

See also Correlated Subqueries (Wikipedia).

To avoid this, you could just change your SQL statement to

SELECT * from category ORDER BY RAND() LIMIT 1

However ORDER BY RAND() is very slow, I would suggest you to read Rick James' excellent article "Fetching random rows from a table".

答案2

得分: 1

Your function is called for each separate row independently. Each call generates a new single id value. So the amount of output rows vary.

You must call the function once. For example, with

select category.* 
from category 
JOIN (SELECT random_category() AS id) AS criteria USING (id);

Also, you may try to define your function as DETERMINISTIC:

create or replace function random_category() returns int DETERMINISTIC
begin
    return (select cc.id from category cc order by rand() limit 1);
end

AFAIR in this case the function output is treated as a constant (it has no arguments), and it should be called once.. but I'm not sure.

英文:

Your function is called for each separate row independently. Each call generates new single id value. So the amount of output rows vary.

You must call the function once. For example, with

select category.* 
from category 
JOIN (SELECT random_category() AS id) AS criteria USING (id);

Also you may try to define your function as DETERMINISTIC:

create or replace function random_category() returns int DETERMINISTIC
begin
    return (select cc.id from category cc order by rand() limit 1);
end

AFAIR in this case the function output is treated as a constant (it have no arguments), and it should be called once.. but I'm not sure.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建返回随机 ID 的 MariaDB 函数。

问题

答案1

答案2

从一个包含 n 个数字的列表中选择样本，不重复。

如何使用crypto/rand生成熵

使用math/rand在golang中生成一个随机变量。

在Golang中处理MySQL 1045错误的最佳方法是什么？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论