英文:
SQL query to compare two rows within the same table
问题
在这里我有一个pizza表
| pizza_id | toppings | 
|---|---|
| 1 | 1,2,3,4,5,6,8,10 | 
| 2 | 4,6,7,9,11,12 | 
我想知道每个pizza_id中共同的配料。在两个pizza_id中使用最多的配料是...
Pizza_id    toppings
1             4,6
2             4,6
我尝试过使用连接(JOINS),但无法满足条件。
请问有人能给我一些提示吗?
谢谢
英文:
Here I have pizza table
| pizza_id | toppings | 
|---|---|
| 1 | 1,2,3,4,5,6,8,10 | 
| 2 | 4,6,7,9,11,12 | 
I would like to know toppings with each pizza_id in common. Most used toppings in both the pizza_id..., expected answer as below table
Pizza_id    toppings
1             4,6
2             4,6
I have tried using JOINS but couldn't satisfy the condition.
Could anyone please give me hint.
Thank you
答案1
得分: 1
你的数据库没有遵循第一范式(1NF):每个表格单元必须包含一个单一数值。最好的做法是创建pizza_table和topping_table,它们之间具有多对多的关系。这样,有一个表格包含pizza_id与每个配料相关联的信息。
Pizza表格如下:
| pizza_id | pizza_name | 
|---|---|
| 1 | Margherita | 
| 2 | Capricciosa | 
配料表格如下:
| topping_id | topping_name | 
|---|---|
| 1 | Pomodoro | 
| 2 | Mozzarella | 
多对多关系的表格如下:
| pizza_id | topping_id | 
|---|---|
| 1 | 1 | 
| 1 | 2 | 
| ... | ... | 
在这个表格中,你可以执行所有需要的操作来获取你的数据。
英文:
Your db is not respecting 1NF:each table cell must contain a single value. The best way to do so is having pizza_table and topping_table with a N-to-N relationship. In this way there is a table containing the pizza_id related with EVERY topping it have.
Pizza table is formed as:
| pizza_id | pizza_name | 
|---|---|
| 1 | Margherita | 
| 2 | Capricciosa | 
Topping table is formed as:
| topping_id | topping_name | 
|---|---|
| 1 | Pomodoro | 
| 2 | Mozzarella | 
And N-to-N table will be:
| pizza_id | topping_id | 
|---|---|
| 1 | 1 | 
| 1 | 2 | 
| ... | ... | 
In this table you can make all operation you need to get your data.
答案2
得分: 1
正如大家已经指出的,你不应该将逗号分隔的数值存储在单个单元格中。
但是,回答你的问题,假设你正在寻找所有披萨的配料CSV的交集,并且你有一个包含(topping_id,name)的配料表,你可以这样做:
SELECT
    p1.pizza_id AS p1_id,
    p2.pizza_id AS p2_id,
    GROUP_CONCAT(t.topping_id) AS toppings,
    COUNT(*) AS num
FROM pizzas p1
JOIN toppings t
    ON FIND_IN_SET(t.topping_id, REPLACE(p1.toppings, ', ', ','))
JOIN pizzas p2
    ON p1.pizza_id < p2.pizza_id
    AND FIND_IN_SET(t.topping_id, REPLACE(p2.toppings, ', ', ','))
GROUP BY p1.pizza_id, p2.pizza_id
ORDER BY num DESC;
给定这些披萨:
| pizza_id | toppings | 
|---|---|
| 1 | 1, 2, 3, 4, 5, 6, 8, 10 | 
| 2 | 4, 6, 7, 9, 11, 12 | 
| 3 | 1, 6 | 
上面的查询将返回:
| p1_id | p2_id | toppings | num | 
|---|---|---|---|
| 1 | 2 | 4,6 | 2 | 
| 1 | 3 | 1,6 | 2 | 
| 2 | 3 | 6 | 1 | 
这是非常低效的,更好的方法是使用ElNicho建议的联接表。
如果你转而使用一个联接(N对N)表,比如pizzas_toppings (pizza_id, topping_id),查询变成了:
SELECT
    p1.pizza_id AS p1_id,
    p2.pizza_id AS p2_id,
    GROUP_CONCAT(p1.topping_id) AS toppings,
    COUNT(*) AS num
FROM pizzas_toppings p1
JOIN pizzas_toppings p2
    ON p1.pizza_id < p2.pizza_id
    AND p1.topping_id = p2.topping_id
GROUP BY p1.pizza_id, p2.pizza_id
ORDER BY num DESC;
确保你的联接表在两个方向上都有索引:
CREATE TABLE `pizzas_toppings` (
    pizza_id INT UNSIGNED NOT NULL,
    topping_id INT UNSIGNED NOT NULL,
    PRIMARY KEY (pizza_id, topping_id),
    INDEX (topping_id, pizza_id),
    FOREIGN KEY (pizza_id) REFERENCES pizzas (pizza_id),
    FOREIGN KEY (topping_id) REFERENCES toppings (topping_id)
);
英文:
As already pointed out by everyone, you should not be storing comma separated values in a single cell like that.
But, to answer your question, assuming you are looking for the intersection of the toppings CSV for all pizzas, and you have a toppings table with (topping_id, name), you could do something like:
SELECT
    p1.pizza_id AS p1_id,
    p2.pizza_id AS p2_id,
    GROUP_CONCAT(t.topping_id) AS toppings,
    COUNT(*) AS num
FROM pizzas p1
JOIN toppings t
    ON FIND_IN_SET(t.topping_id, REPLACE(p1.toppings, ', ', ','))
JOIN pizzas p2
    ON p1.pizza_id < p2.pizza_id
    AND FIND_IN_SET(t.topping_id, REPLACE(p2.toppings, ', ', ','))
GROUP BY p1.pizza_id, p2.pizza_id
ORDER BY num DESC;
Given these pizzas:
| pizza_id | toppings | 
|---|---|
| 1 | 1, 2, 3, 4, 5, 6, 8, 10 | 
| 2 | 4, 6, 7, 9, 11, 12 | 
| 3 | 1, 6 | 
The above query will return:
| p1_id | p2_id | toppings | num | 
|---|---|---|---|
| 1 | 2 | 4,6 | 2 | 
| 1 | 3 | 1,6 | 2 | 
| 2 | 3 | 6 | 1 | 
This is insanely inefficient and would be much better served by the junction table suggested by ElNicho.
If you switch to using a junction (N-to-N) table like pizzas_toppings (pizza_id, topping_id), the query becomes:
SELECT
    p1.pizza_id AS p1_id,
    p2.pizza_id AS p2_id,
    GROUP_CONCAT(p1.topping_id) AS toppings,
    COUNT(*) AS num
FROM pizzas_toppings p1
JOIN pizzas_toppings p2
    ON p1.pizza_id < p2.pizza_id
    AND p1.topping_id = p2.topping_id
GROUP BY p1.pizza_id, p2.pizza_id
ORDER BY num DESC;
Make sure your junction table is indexed in both directions:
CREATE TABLE `pizzas_toppings` (
    pizza_id INT UNSIGNED NOT NULL,
    topping_id INT UNSIGNED NOT NULL,
    PRIMARY KEY (pizza_id, topping_id),
    INDEX (topping_id, pizza_id),
    FOREIGN KEY (pizza_id) REFERENCES pizzas (pizza_id),
    FOREIGN KEY (topping_id) REFERENCES toppings (topping_id)
);
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论