英文:
SQL query to compare two rows within the same table
问题
在这里我有一个pizza
表
pizza_id | toppings |
---|---|
1 | 1,2,3,4,5,6,8,10 |
2 | 4,6,7,9,11,12 |
我想知道每个pizza_id
中共同的配料。在两个pizza_id
中使用最多的配料是...
Pizza_id toppings
1 4,6
2 4,6
我尝试过使用连接(JOINS),但无法满足条件。
请问有人能给我一些提示吗?
谢谢
英文:
Here I have pizza
table
pizza_id | toppings |
---|---|
1 | 1,2,3,4,5,6,8,10 |
2 | 4,6,7,9,11,12 |
I would like to know toppings with each pizza_id in common. Most used toppings in both the pizza_id..., expected answer as below table
Pizza_id toppings
1 4,6
2 4,6
I have tried using JOINS but couldn't satisfy the condition.
Could anyone please give me hint.
Thank you
答案1
得分: 1
你的数据库没有遵循第一范式(1NF):每个表格单元必须包含一个单一数值。最好的做法是创建pizza_table
和topping_table
,它们之间具有多对多的关系。这样,有一个表格包含pizza_id
与每个配料相关联的信息。
Pizza表格如下:
pizza_id | pizza_name |
---|---|
1 | Margherita |
2 | Capricciosa |
配料表格如下:
topping_id | topping_name |
---|---|
1 | Pomodoro |
2 | Mozzarella |
多对多关系的表格如下:
pizza_id | topping_id |
---|---|
1 | 1 |
1 | 2 |
... | ... |
在这个表格中,你可以执行所有需要的操作来获取你的数据。
英文:
Your db is not respecting 1NF:each table cell must contain a single value. The best way to do so is having pizza_table
and topping_table
with a N-to-N relationship. In this way there is a table containing the pizza_id
related with EVERY topping it have.
Pizza table is formed as:
pizza_id | pizza_name |
---|---|
1 | Margherita |
2 | Capricciosa |
Topping table is formed as:
topping_id | topping_name |
---|---|
1 | Pomodoro |
2 | Mozzarella |
And N-to-N table will be:
pizza_id | topping_id |
---|---|
1 | 1 |
1 | 2 |
... | ... |
In this table you can make all operation you need to get your data.
答案2
得分: 1
正如大家已经指出的,你不应该将逗号分隔的数值存储在单个单元格中。
但是,回答你的问题,假设你正在寻找所有披萨的配料CSV的交集,并且你有一个包含(topping_id,name)的配料表,你可以这样做:
SELECT
p1.pizza_id AS p1_id,
p2.pizza_id AS p2_id,
GROUP_CONCAT(t.topping_id) AS toppings,
COUNT(*) AS num
FROM pizzas p1
JOIN toppings t
ON FIND_IN_SET(t.topping_id, REPLACE(p1.toppings, ', ', ','))
JOIN pizzas p2
ON p1.pizza_id < p2.pizza_id
AND FIND_IN_SET(t.topping_id, REPLACE(p2.toppings, ', ', ','))
GROUP BY p1.pizza_id, p2.pizza_id
ORDER BY num DESC;
给定这些披萨:
pizza_id | toppings |
---|---|
1 | 1, 2, 3, 4, 5, 6, 8, 10 |
2 | 4, 6, 7, 9, 11, 12 |
3 | 1, 6 |
上面的查询将返回:
p1_id | p2_id | toppings | num |
---|---|---|---|
1 | 2 | 4,6 | 2 |
1 | 3 | 1,6 | 2 |
2 | 3 | 6 | 1 |
这是非常低效的,更好的方法是使用ElNicho建议的联接表。
如果你转而使用一个联接(N对N)表,比如pizzas_toppings (pizza_id, topping_id)
,查询变成了:
SELECT
p1.pizza_id AS p1_id,
p2.pizza_id AS p2_id,
GROUP_CONCAT(p1.topping_id) AS toppings,
COUNT(*) AS num
FROM pizzas_toppings p1
JOIN pizzas_toppings p2
ON p1.pizza_id < p2.pizza_id
AND p1.topping_id = p2.topping_id
GROUP BY p1.pizza_id, p2.pizza_id
ORDER BY num DESC;
确保你的联接表在两个方向上都有索引:
CREATE TABLE `pizzas_toppings` (
pizza_id INT UNSIGNED NOT NULL,
topping_id INT UNSIGNED NOT NULL,
PRIMARY KEY (pizza_id, topping_id),
INDEX (topping_id, pizza_id),
FOREIGN KEY (pizza_id) REFERENCES pizzas (pizza_id),
FOREIGN KEY (topping_id) REFERENCES toppings (topping_id)
);
英文:
As already pointed out by everyone, you should not be storing comma separated values in a single cell like that.
But, to answer your question, assuming you are looking for the intersection of the toppings CSV for all pizzas, and you have a toppings table with (topping_id, name), you could do something like:
SELECT
p1.pizza_id AS p1_id,
p2.pizza_id AS p2_id,
GROUP_CONCAT(t.topping_id) AS toppings,
COUNT(*) AS num
FROM pizzas p1
JOIN toppings t
ON FIND_IN_SET(t.topping_id, REPLACE(p1.toppings, ', ', ','))
JOIN pizzas p2
ON p1.pizza_id < p2.pizza_id
AND FIND_IN_SET(t.topping_id, REPLACE(p2.toppings, ', ', ','))
GROUP BY p1.pizza_id, p2.pizza_id
ORDER BY num DESC;
Given these pizzas:
pizza_id | toppings |
---|---|
1 | 1, 2, 3, 4, 5, 6, 8, 10 |
2 | 4, 6, 7, 9, 11, 12 |
3 | 1, 6 |
The above query will return:
p1_id | p2_id | toppings | num |
---|---|---|---|
1 | 2 | 4,6 | 2 |
1 | 3 | 1,6 | 2 |
2 | 3 | 6 | 1 |
This is insanely inefficient and would be much better served by the junction table suggested by ElNicho.
If you switch to using a junction (N-to-N) table like pizzas_toppings (pizza_id, topping_id)
, the query becomes:
SELECT
p1.pizza_id AS p1_id,
p2.pizza_id AS p2_id,
GROUP_CONCAT(p1.topping_id) AS toppings,
COUNT(*) AS num
FROM pizzas_toppings p1
JOIN pizzas_toppings p2
ON p1.pizza_id < p2.pizza_id
AND p1.topping_id = p2.topping_id
GROUP BY p1.pizza_id, p2.pizza_id
ORDER BY num DESC;
Make sure your junction table is indexed in both directions:
CREATE TABLE `pizzas_toppings` (
pizza_id INT UNSIGNED NOT NULL,
topping_id INT UNSIGNED NOT NULL,
PRIMARY KEY (pizza_id, topping_id),
INDEX (topping_id, pizza_id),
FOREIGN KEY (pizza_id) REFERENCES pizzas (pizza_id),
FOREIGN KEY (topping_id) REFERENCES toppings (topping_id)
);
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论