在SQL中按列进行聚合,以获得样本大小和百分比,使用CASE WHEN。

huangapple go评论151阅读模式
英文:

Aggregate By Column in SQL That Results in Sample Size and % with Case When

问题

这是你想要的翻译结果:

  1. SELECT
  2. device AS attribute,
  3. COUNT(CASE WHEN group_name = 'control' THEN 1 END) AS control_sample,
  4. COUNT(CASE WHEN group_name = 'treatment' THEN 1 END) AS treatment_sample,
  5. COUNT(CASE WHEN group_name = 'control' THEN 1 END) / SUM(CASE WHEN group_name = 'control' THEN 1 ELSE 0 END) OVER () AS score_control,
  6. COUNT(CASE WHEN group_name = 'treatment' THEN 1 END) / SUM(CASE WHEN group_name = 'treatment' THEN 1 ELSE 0 END) OVER () AS score_treatment
  7. FROM your_table
  8. GROUP BY device;

这是你的查询,已根据你的要求进行了修改。如果每个设备有100个样本,这个查询将计算每个组内的样本总数以及每个设备在组内的百分比。

英文:

Hi I have a table in SQL (presto) that has the following format:

  1. uuid group_name transaction_id sales device is_local
  2. xyz control 123x 11 ios false
  3. abc treatment 124x 12 android true
  4. def control 125x 13 android false
  5. ghi treatment 126x 14 ios false
  6. jkl control 127x 15 ios true
  7. mno treatment 128x 16 android false

how do I write a query where if there were 100 devices I can get the total sample within the group name and the percentage of each device within the group name?

I want my final table to look like this based on the device column:

  1. attribute control_sample treatment_sample score_control score_treatment
  2. ios 100 50 0.5 0.33
  3. android 100 50 0.7 0.40
  4. tablet 100 50 0.3 0.80
  5. TV 100 50 0.6 0.20

this is the attempt originally but it did not give me the output I wanted:

  1. device AS (
  2. SELECT
  3. device AS attribute,
  4. COUNT(CASE WHEN group_name = 'control' THEN 1 END) AS control_sample,
  5. COUNT(CASE WHEN group_name = 'treatment' THEN 1 END) AS treatment_sample,
  6. COUNT(CASE WHEN group_name = 'control' THEN 1 END) / (SELECT COUNT(*) FROM your_table WHERE group_name = 'control') AS score_control,
  7. COUNT(CASE WHEN group_name = 'treatment' THEN 1 END) / (SELECT COUNT(*) FROM your_table WHERE group_name = 'treatment') AS score_treatment
  8. FROM your_table
  9. GROUP BY device

What is the best way to do this?

答案1

得分: 1

如果我理解你的问题,我认为你已经非常接近解决方案了。请注意,如果你将一个整数除以一个整数,你将得到一个整数结果,而你需要小数。我建议如下操作:

  1. SELECT
  2. t.device AS 属性
  3. , COUNT(CASE WHEN t.group_name = 'control' THEN 1 END) AS 控制组样本数
  4. , COUNT(CASE WHEN t.group_name = 'treatment' THEN 1 END) AS 治疗组样本数
  5. , COUNT(CASE WHEN t.group_name = 'control' THEN 1 END) / MAX(g.group_count) AS 控制组得分
  6. , COUNT(CASE WHEN t.group_name = 'treatment' THEN 1 END) / MAX(g.group_count) AS 治疗组得分
  7. FROM your_table AS t
  8. INNER JOIN (
  9. SELECT
  10. group_name
  11. , count(*) * 1.0 group_count -- 通过 * 1.0 转换为小数
  12. FROM your_table
  13. GROUP BY
  14. group_name
  15. ) AS g ON t.group_name = g.group_name
  16. GROUP BY
  17. t.device
属性 控制组样本数 治疗组样本数 控制组得分 治疗组得分
android 1 2 0.33333333333333333333 0.66666666666666666667
ios 2 1 0.66666666666666666667 0.33333333333333333333

fiddle

英文:

If I understand your question I think you were quite close to the solution. Please note that if you divide an integer by an integer you will get an integer result, and you need decimals. I suggest the following:

  1. SELECT
  2. t.device AS attribute
  3. , COUNT(CASE WHEN t.group_name = 'control' THEN 1 END) AS control_sample
  4. , COUNT(CASE WHEN t.group_name = 'treatment' THEN 1 END) AS treatment_sample
  5. , COUNT(CASE WHEN t.group_name = 'control' THEN 1 END) / MAX(g.group_count) AS score_control
  6. , COUNT(CASE WHEN t.group_name = 'treatment' THEN 1 END) / MAX(g.group_count) AS score_treatment
  7. FROM your_table AS t
  8. INNER JOIN (
  9. SELECT
  10. group_name
  11. , count(*) * 1.0 group_count -- "convert" to decimal by the * 1.0
  12. FROM your_table
  13. GROUP BY
  14. group_name
  15. ) AS g ON t.group_name = g.group_name
  16. GROUP BY
  17. t.device
attribute control_sample treatment_sample score_control score_treatment
android 1 2 0.33333333333333333333 0.66666666666666666667
ios 2 1 0.66666666666666666667 0.33333333333333333333

fiddle

答案2

得分: 1

我建议在分组结果上使用 sum 窗口函数

  1. -- 样本数据
  2. with dataset(uuid, group_name, transaction_id, sales, device, is_local) as (
  3. values ('xyz', 'control', '123x', 11, 'ios', false),
  4. ('abc', 'treatment', '124x', 12, 'android', true),
  5. ('def', 'control', '125x', 13, 'android', false),
  6. ('ghi', 'treatment', '126x', 14, 'ios', false),
  7. ('jkl', 'control', '127x', 15, 'ios', true),
  8. ('mno', 'treatment', '128x', 16, 'android', false)
  9. )
  10. -- 查询
  11. select attribute,
  12. control_sample,
  13. treatment_sample,
  14. control_sample * 1.0 / sum(control_sample) over () score_control,
  15. treatment_sample * 1.0 / sum(treatment_sample) over () score_treatment
  16. from (
  17. select device attribute,
  18. count_if(group_name = 'control') control_sample,
  19. count_if(group_name = 'treatment') treatment_sample
  20. from dataset
  21. group by device);

输出:

属性 控制组样本数 治疗组样本数 控制组得分 治疗组得分
android 1 2 0.3 0.7
ios 2 1 0.7 0.3

但是,根据我看到的,你的查询存在一个问题,即整数除法(例如,当两个操作数都是整数类型时,999/1000 是0),所以你可以通过将其中一个操作数乘以1.0来修复它 - 例如 COUNT(CASE WHEN group_name = 'control' THEN 1 END) * 1.0/ (SELECT COUNT(*) ...) AS score_control

英文:

I would suggest to use sum window function over the grouping results:

  1. -- sample data
  2. with dataset(uuid, group_name, transaction_id, sales, device, is_local) as (
  3. values ('xyz', 'control', '123x', 11, 'ios', false),
  4. ('abc', 'treatment', '124x', 12, 'android', true),
  5. ('def', 'control', '125x', 13, 'android', false),
  6. ('ghi', 'treatment', '126x', 14, 'ios', false),
  7. ('jkl', 'control', '127x', 15, 'ios', true),
  8. ('mno', 'treatment', '128x', 16, 'android', false)
  9. )
  10. -- query
  11. select attribute,
  12. control_sample,
  13. treatment_sample,
  14. control_sample * 1.0 / sum(control_sample) over () score_control,
  15. treatment_sample * 1.0 / sum(treatment_sample) over () score_treatment
  16. from (
  17. select device attribute,
  18. count_if(group_name = 'control') control_sample,
  19. count_if(group_name = 'treatment') treatment_sample
  20. from dataset
  21. group by device);

Output:

attribute control_sample treatment_sample score_control score_treatment
android 1 2 0.3 0.7
ios 2 1 0.7 0.3

But the problem with your query was (as far as I can see) is that you got struck by another case of integer division (i.e. 999/1000 is 0 when both operands are of integer type), so you can fix it by multiplying one of operands by 1.0 - for example COUNT(CASE WHEN group_name = 'control' THEN 1 END) * 1.0/ (SELECT COUNT(*) ...) AS score_control

huangapple
  • 本文由 发表于 2023年7月11日 07:51:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76657960.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定