2023年2月27日 19:31:30go评论64阅读模式

英文:

SQL query to compare two rows within the same table

问题

在这里我有一个pizza表

pizza_id	toppings
1	1,2,3,4,5,6,8,10
2	4,6,7,9,11,12

我想知道每个pizza_id中共同的配料。在两个pizza_id中使用最多的配料是...

Pizza_id    toppings
1             4,6
2             4,6

我尝试过使用连接（JOINS），但无法满足条件。
请问有人能给我一些提示吗？
谢谢

英文:

Here I have pizza table

pizza_id	toppings
1	1,2,3,4,5,6,8,10
2	4,6,7,9,11,12

I would like to know toppings with each pizza_id in common. Most used toppings in both the pizza_id..., expected answer as below table

Pizza_id    toppings
1             4,6
2             4,6

I have tried using JOINS but couldn't satisfy the condition.
Could anyone please give me hint.
Thank you

答案1

得分: 1

你的数据库没有遵循第一范式（1NF）：每个表格单元必须包含一个单一数值。最好的做法是创建pizza_table和topping_table，它们之间具有多对多的关系。这样，有一个表格包含pizza_id与每个配料相关联的信息。

Pizza表格如下：

pizza_id	pizza_name
1	Margherita
2	Capricciosa

配料表格如下：

topping_id	topping_name
1	Pomodoro
2	Mozzarella

多对多关系的表格如下：

pizza_id	topping_id
1	1
1	2
...	...

在这个表格中，你可以执行所有需要的操作来获取你的数据。

英文:

Your db is not respecting 1NF:each table cell must contain a single value. The best way to do so is having pizza_table and topping_table with a N-to-N relationship. In this way there is a table containing the pizza_id related with EVERY topping it have.

Pizza table is formed as:

pizza_id	pizza_name
1	Margherita
2	Capricciosa

Topping table is formed as:

topping_id	topping_name
1	Pomodoro
2	Mozzarella

And N-to-N table will be:

pizza_id	topping_id
1	1
1	2
...	...

In this table you can make all operation you need to get your data.

答案2

得分: 1

正如大家已经指出的，你不应该将逗号分隔的数值存储在单个单元格中。

但是，回答你的问题，假设你正在寻找所有披萨的配料CSV的交集，并且你有一个包含（topping_id，name）的配料表，你可以这样做：

SELECT
    p1.pizza_id AS p1_id,
    p2.pizza_id AS p2_id,
    GROUP_CONCAT(t.topping_id) AS toppings,
    COUNT(*) AS num
FROM pizzas p1
JOIN toppings t
    ON FIND_IN_SET(t.topping_id, REPLACE(p1.toppings, ', ', ','))
JOIN pizzas p2
    ON p1.pizza_id < p2.pizza_id
    AND FIND_IN_SET(t.topping_id, REPLACE(p2.toppings, ', ', ','))
GROUP BY p1.pizza_id, p2.pizza_id
ORDER BY num DESC;

给定这些披萨：

pizza_id	toppings
1	1, 2, 3, 4, 5, 6, 8, 10
2	4, 6, 7, 9, 11, 12
3	1, 6

上面的查询将返回：

p1_id	p2_id	toppings	num
1	2	4,6	2
1	3	1,6	2
2	3	6	1

这是非常低效的，更好的方法是使用ElNicho建议的联接表。

如果你转而使用一个联接（N对N）表，比如pizzas_toppings (pizza_id, topping_id)，查询变成了：

SELECT
    p1.pizza_id AS p1_id,
    p2.pizza_id AS p2_id,
    GROUP_CONCAT(p1.topping_id) AS toppings,
    COUNT(*) AS num
FROM pizzas_toppings p1
JOIN pizzas_toppings p2
    ON p1.pizza_id < p2.pizza_id
    AND p1.topping_id = p2.topping_id
GROUP BY p1.pizza_id, p2.pizza_id
ORDER BY num DESC;

确保你的联接表在两个方向上都有索引：

CREATE TABLE `pizzas_toppings` (
    pizza_id INT UNSIGNED NOT NULL,
    topping_id INT UNSIGNED NOT NULL,
    PRIMARY KEY (pizza_id, topping_id),
    INDEX (topping_id, pizza_id),
    FOREIGN KEY (pizza_id) REFERENCES pizzas (pizza_id),
    FOREIGN KEY (topping_id) REFERENCES toppings (topping_id)
);

英文:

As already pointed out by everyone, you should not be storing comma separated values in a single cell like that.

But, to answer your question, assuming you are looking for the intersection of the toppings CSV for all pizzas, and you have a toppings table with (topping_id, name), you could do something like:

SELECT
    p1.pizza_id AS p1_id,
    p2.pizza_id AS p2_id,
    GROUP_CONCAT(t.topping_id) AS toppings,
    COUNT(*) AS num
FROM pizzas p1
JOIN toppings t
    ON FIND_IN_SET(t.topping_id, REPLACE(p1.toppings, &#39;, &#39;, &#39;,&#39;))
JOIN pizzas p2
    ON p1.pizza_id &lt; p2.pizza_id
    AND FIND_IN_SET(t.topping_id, REPLACE(p2.toppings, &#39;, &#39;, &#39;,&#39;))
GROUP BY p1.pizza_id, p2.pizza_id
ORDER BY num DESC;

Given these pizzas:

pizza_id	toppings
1	1, 2, 3, 4, 5, 6, 8, 10
2	4, 6, 7, 9, 11, 12
3	1, 6

The above query will return:

p1_id	p2_id	toppings	num
1	2	4,6	2
1	3	1,6	2
2	3	6	1

This is insanely inefficient and would be much better served by the junction table suggested by ElNicho.

If you switch to using a junction (N-to-N) table like pizzas_toppings (pizza_id, topping_id), the query becomes:

SELECT
    p1.pizza_id AS p1_id,
    p2.pizza_id AS p2_id,
    GROUP_CONCAT(p1.topping_id) AS toppings,
    COUNT(*) AS num
FROM pizzas_toppings p1
JOIN pizzas_toppings p2
    ON p1.pizza_id &lt; p2.pizza_id
    AND p1.topping_id = p2.topping_id
GROUP BY p1.pizza_id, p2.pizza_id
ORDER BY num DESC;

Make sure your junction table is indexed in both directions:

CREATE TABLE `pizzas_toppings` (
    pizza_id INT UNSIGNED NOT NULL,
    topping_id INT UNSIGNED NOT NULL,
    PRIMARY KEY (pizza_id, topping_id),
    INDEX (topping_id, pizza_id),
    FOREIGN KEY (pizza_id) REFERENCES pizzas (pizza_id),
    FOREIGN KEY (topping_id) REFERENCES toppings (topping_id)
);

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

SQL查询以比较同一表内的两行。

问题

答案1

答案2

非常慢的查询性能在 AWS 的 PostgreSQL 中，对于一个有 40 亿行的表。

在Go语言中使用SQL驱动进行连接池管理

将SQL数据按单行分组打印，避免重复行。

Self join – 更新一个变量 – proc sql

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论