英文:
How to select most frequent value after group by
问题
我尝试对某个实体进行分类,比如说一个用户,将表中与该实体最频繁关联的值分类为该实体的情感。因此,当用户主要记录"Sad"值时,该用户将被分类为"Sad",或者当用户主要记录"Happy"值时,该用户将被分类为"Happy"。
原始表的示例:
user_id | emotion | registered_on |
---|---|---|
"PAM" | "SAD" | 2021-04-05 |
"PAM" | "HAPPY" | 2021-04-06 |
"PAM" | "HAPPY" | 2021-04-07 |
"TIM" | "SAD" | 2021-04-06 |
"TIM" | "SAD" | 2021-05-01 |
"TIM" | "HAPPY" | 2021-06-05 |
我正在寻找的结果:
user_id | year | emotion |
---|---|---|
"TIM" | 2021 | "SAD" |
"PAM" | 2021 | "HAPPY" |
我被限制在使用Mysql 5.7,因此无法使用窗口函数。
以下是我提出的选择,计划将其用作子查询来选择,但我现在不知道如何只选择每个用户的how_many_times值较高的行。
select
user_id,
YEAR(MAX(registered_on)) as year,
emotion,
COUNT(user_id) as how_many_times
from users_emotions
group by user_id, emotion
order by user_id, how_many_times desc
有人可以帮忙吗?
这是fiddle链接:https://www.db-fiddle.com/f/bQnSQZpAorT48Rv2DRMjxS/2
编辑:
我几乎已经解决了问题,被标记为重复的答案对我有帮助,而被标记为正确答案的不行,但Vishal Kumar的答案可以解决问题:
(为了清晰起见,删除了年份)
select ordered.user_id, SUBSTRING_INDEX(GROUP_CONCAT(ordered.emotion order by ordered.how_many_times desc), ',',1) as emotions from
(select
user_id,
COUNT(user_id) as how_many_times,
emotion
from users_emotions
group by user_id, emotion) as ordered
group by ordered.user_id
;
我理解它不是完美的,因为在平局的情况下,它只会选择一个值,但它确实满足我的需求!
https://www.db-fiddle.com/f/qyj341zapSob6pvAkcw7Fd/0
英文:
I'm trying to categorize a certain entity, let's say, a User, with the most frequent value associated in a table to this entity, let's say, an Emotion.
Thus,a User will be categorized as "Sad" when he logs mostly the value "Sad", or "Happy" when he logs mostly the value "Happy".
An example of the origin table:
user_id | emotion | registered_on |
---|---|---|
"PAM" | "SAD" | 2021-04-05 |
"PAM" | "HAPPY" | 2021-04-06 |
"PAM" | "HAPPY" | 2021-04-07 |
"TIM" | "SAD" | 2021-04-06 |
"TIM" | "SAD" | 2021-05-01 |
"TIM" | "HAPPY" | 2021-06-05 |
the result I'm looking for:
user_id | year | emotion |
---|---|---|
"TIM" | 2021 | "SAD" |
"PAM" | 2021 | "HAPPY" |
I'm stuck with Mysql 5.7 so I cannot use window functions.
this is the select I came up with, was planning on using it as a subquery to choose from,
but now I'm stuck as to how to select only the rows with the higher value in how_many_times for each user.
select
user_id,
YEAR(MAX(registered_on)) as year,
emotion,
COUNT(user_id) as how_many_times
from users_emotions
group by user_id, emotion
order by user_id, how_many_times desc
can anyone help?
This is the fiddle: https://www.db-fiddle.com/f/bQnSQZpAorT48Rv2DRMjxS/2
Edit:
So I was almost there, the answer marked as duplicate helped me, not the one marked as the right one, but the one by Vishal Kumar:
(eliminated year for clarity)
select ordered.user_id, SUBSTRING_INDEX(GROUP_CONCAT(ordered.emotion order by ordered.how_many_times desc), ',',1) as emotions from
(select
user_id,
COUNT(user_id) as how_many_times,
emotion
from users_emotions
group by user_id, emotion) as ordered
group by ordered.user_id
;
I understand it's not perfect, because in case of a tie, it just picks one, but it does what I need!
答案1
得分: 1
以下是您要翻译的代码部分:
Query #1
SELECT
user_id, year_emotion,
CASE WHEN how_many_happy = how_many_unhappy THEN 'middle'
WHEN how_many_happy > how_many_unhappy THEN 'HAPPY'
ELSE 'SAD' END emotions
FROM
(select
user_id,
YEAR(registered_on) year_emotion,
SUM( emotion = "HAPPY") as how_many_happy,
SUM( emotion = "SAD") as how_many_unhappy
from users_emotions
group by user_id, YEAR(registered_on)) t1
ORDER BY user_id
;
Query #1 (含有"with 3 it gets more complicated" 部分)
SELECT
user_id, year_emotion,
CASE WHEN how_many_happy = how_many_unhappy
AND how_many_happy = how_many_confused THEN 'middle'
WHEN how_many_happy >= how_many_unhappy THEN
CASE WHEN how_many_happy > how_many_confused THEN 'HAPPY'
ELSE 'COnfused' END
ELSE
CASE WHEN how_many_unhappy > how_many_confused THEN 'SAD'
ELSE 'COMNFUSED'
END
END emotions
FROM
(select
user_id,
YEAR(registered_on) year_emotion,
SUM( emotion = "HAPPY") as how_many_happy,
SUM( emotion = "SAD") as how_many_unhappy
,SUM( emotion = "CONFUSED") as how_many_confused
from users_emotions
group by user_id, YEAR(registered_on)) t1
ORDER BY user_id
;
英文:
The query would differ much from MySQL 5.x to MySQL 8, you onoly would use CTE's
With only two emotions this is quite easy
Query #1
SELECT
user_id,year_emotion,
CASE WHEN how_many_happy = how_many_unhappy THEN 'middle'
WHEN how_many_happy > how_many_unhappy THEN 'HAPPY'
ELSE 'SAD' END emotions
FROM
(select
user_id,
YEAR(registered_on) year_emotion,
SUM( emotion = "HAPPY") as how_many_happy,
SUM( emotion = "SAD") as how_many_unhappy
from users_emotions
group by user_id, YEAR(registered_on)) t1
ORDER BY user_id
;
user_id | year_emotion | emotions |
---|---|---|
1 | 2021 | SAD |
2 | 2021 | middle |
3 | 2021 | SAD |
with 3 it gets more complicated
Query #1
SELECT
user_id,year_emotion,
CASE WHEN how_many_happy = how_many_unhappy
AND how_many_happy = how_many_confused THEN 'middle'
WHEN how_many_happy >= how_many_unhappy THEN
CASE WHEN how_many_happy > how_many_confused then 'HAPPY'
ELSE 'COnfused' END
ELSE
CASE WHEN how_many_unhappy > how_many_confused then 'SAD'
ELSE 'COMNFUSED'
END
END emotions
FROM
(select
user_id,
YEAR(registered_on) year_emotion,
SUM( emotion = "HAPPY") as how_many_happy,
SUM( emotion = "SAD") as how_many_unhappy
,SUM( emotion = "CONFUSED") as how_many_confused
from users_emotions
group by user_id, YEAR(registered_on)) t1
ORDER BY user_id
;
user_id | year_emotion | emotions |
---|---|---|
1 | 2021 | SAD |
2 | 2021 | HAPPY |
3 | 2021 | COMNFUSED |
答案2
得分: 0
你可以按照列 user_id、Year 和 emotion
对数据进行分组,然后使用 ROW_NUMBER()
窗口函数选择每个 user_id、Year 和 emotion
组合中具有最高 count
的行。生成的输出显示了具有最高计数的每行的 user_id、Year 和 emotion
:
查询:
SELECT
user_id,
Year,
emotion
FROM
(
SELECT
user_id,
DATE_FORMAT(registered_on, '%Y') AS Year,
emotion,
COUNT(*) AS count,
ROW_NUMBER() OVER(
PARTITION BY user_id
ORDER BY
COUNT(*) DESC
) rn
FROM
users_emotions
GROUP BY
user_id,
Year,
emotion
) t
WHERE
rn = 1
英文:
You could groupe the data by columns user_id, Year, and emotion
and then selecting the row with the highest count
for each user_id, Year, and emotion
combination using the ROW_NUMBER()
window function. The resulting output shows the user_id, Year, and emotion
for each row with the highest count:
Query:
SELECT
user_id,
Year,
emotion
FROM
(
SELECT
user_id,
DATE_FORMAT(registered_on, '%Y') AS Year,
emotion,
COUNT(*) AS count,
ROW_NUMBER() OVER(
PARTITION BY user_id
ORDER BY
COUNT(*) DESC
) rn
FROM
users_emotions
GROUP BY
user_id,
Year,
emotion
) t
WHERE
rn = 1
答案3
得分: 0
如果是MysSql 5.7,那么您不能使用窗口函数和公共表达式。因此,您必须多次使用派生表(子查询):
select t1.*
from (
select
user_id,
emotion,
YEAR(registered_on) as regyear,
COUNT(*) as cnt
from users_emotions
group by 1, 2, 3
) as t1
where t1.cnt = (
select
MAX(cnt) as mcnt
from (
select
user_id,
emotion,
YEAR(registered_on) as regyear,
COUNT(*) as cnt
from users_emotions
group by 1, 2, 3) as sq
);
<details>
<summary>英文:</summary>
Well, if it's MysSql 5.7, then you can not use window functions and common table expressions. So, you have to use derived tables (subqueries) few times:
select t1.*
from (
select
user_id,
emotion,
YEAR(registered_on) as regyear,
COUNT(*) as cnt
from users_emotions
group by 1, 2, 3
) as t1
where t1.cnt = (
select
MAX(cnt) as mcnt
from (
select
user_id,
emotion,
YEAR(registered_on) as regyear,
COUNT(*) as cnt
from users_emotions
group by 1, 2, 3) as sq
);
[Updated Sql fiddle](https://www.db-fiddle.com/f/bQnSQZpAorT48Rv2DRMjxS/7)
</details>
# 答案4
**得分**: 0
if you have a small number of pre-defined `emotions` (categories), then nbk's method will perform much better than this.
This will work for any number of possible `emotions` but uses a slightly arbitrary `e1.emotion > e2.emotion` to break any ties:
```sql
SELECT e1.user_id, e1.reg_year AS year, e1.emotion
FROM (
SELECT
user_id,
emotion,
YEAR(registered_on) as reg_year,
COUNT(*) as cnt
FROM users_emotions
GROUP BY 1, 2, 3
) e1
LEFT JOIN (
SELECT
user_id,
emotion,
YEAR(registered_on) as reg_year,
COUNT(*) as cnt
FROM users_emotions
GROUP BY 1, 2, 3
) e2
ON e1.user_id = e2.user_id
AND e1.reg_year = e2.reg_year
AND (e1.cnt < e2.cnt OR
e1.cnt = e2.cnt AND e1.emotion > e2.emotion)
WHERE e2.user_id IS NULL
Here's a db<>fiddle
英文:
if you have a small number of pre-defined emotions
(categories), then nbk's method will perform much better than this.
This will work for any number of possible emotions
but uses a slightly arbitrary e1.emotion > e2.emotion
to break any ties:
SELECT e1.user_id, e1.reg_year AS year, e1.emotion
FROM (
SELECT
user_id,
emotion,
YEAR(registered_on) as reg_year,
COUNT(*) as cnt
FROM users_emotions
GROUP BY 1, 2, 3
) e1
LEFT JOIN (
SELECT
user_id,
emotion,
YEAR(registered_on) as reg_year,
COUNT(*) as cnt
FROM users_emotions
GROUP BY 1, 2, 3
) e2
ON e1.user_id = e2.user_id
AND e1.reg_year = e2.reg_year
AND (e1.cnt < e2.cnt OR
e1.cnt = e2.cnt AND e1.emotion > e2.emotion)
WHERE e2.user_id IS NULL
Here's a db<>fiddle
答案5
得分: 0
尝试在HAVING子句中使用相关子查询的以下聚合查询:
select user_id,
emotion,
year(max(registered_on)) year_
from users_emotions ue
group by user_id, emotion
having count(*) =
(
select max(cnt)
from
(
select user_id, emotion, count(*) cnt
from users_emotions
group by user_id, emotion
) mx
where mx.user_id = ue.user_id
)
order by user_id
英文:
Try the following aggregation with a correlated subquery in the having clause:
select user_id,
emotion,
year(max(registered_on)) year_
from users_emotions ue
group by user_id, emotion
having count(*) =
(
select max(cnt)
from
(
select user_id, emotion, count(*) cnt
from users_emotions
group by user_id, emotion
) mx
where mx.user_id = ue.user_id
)
order by user_id
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论