在SQL中按列进行聚合,以获得样本大小和百分比,使用CASE WHEN。

huangapple go评论89阅读模式
英文:

Aggregate By Column in SQL That Results in Sample Size and % with Case When

问题

这是你想要的翻译结果:

SELECT
    device AS attribute,
    COUNT(CASE WHEN group_name = 'control' THEN 1 END) AS control_sample,
    COUNT(CASE WHEN group_name = 'treatment' THEN 1 END) AS treatment_sample,
    COUNT(CASE WHEN group_name = 'control' THEN 1 END) / SUM(CASE WHEN group_name = 'control' THEN 1 ELSE 0 END) OVER () AS score_control,
    COUNT(CASE WHEN group_name = 'treatment' THEN 1 END) / SUM(CASE WHEN group_name = 'treatment' THEN 1 ELSE 0 END) OVER () AS score_treatment
FROM your_table
GROUP BY device;

这是你的查询,已根据你的要求进行了修改。如果每个设备有100个样本,这个查询将计算每个组内的样本总数以及每个设备在组内的百分比。

英文:

Hi I have a table in SQL (presto) that has the following format:

uuid    group_name    transaction_id      sales       device      is_local
 xyz       control              123x         11          ios         false
 abc     treatment              124x         12      android          true
 def       control              125x         13      android         false
 ghi     treatment              126x         14          ios         false
 jkl       control              127x         15          ios          true
 mno     treatment              128x         16      android         false

how do I write a query where if there were 100 devices I can get the total sample within the group name and the percentage of each device within the group name?

I want my final table to look like this based on the device column:

attribute    control_sample    treatment_sample    score_control    score_treatment
      ios               100                  50              0.5               0.33
  android               100                  50              0.7               0.40
   tablet               100                  50              0.3               0.80
       TV               100                  50              0.6               0.20

this is the attempt originally but it did not give me the output I wanted:

device AS (
  SELECT
    device AS attribute,
    COUNT(CASE WHEN group_name = 'control' THEN 1 END) AS control_sample,
    COUNT(CASE WHEN group_name = 'treatment' THEN 1 END) AS treatment_sample,
    COUNT(CASE WHEN group_name = 'control' THEN 1 END) / (SELECT COUNT(*) FROM your_table WHERE group_name = 'control') AS score_control,
    COUNT(CASE WHEN group_name = 'treatment' THEN 1 END) / (SELECT COUNT(*) FROM your_table WHERE group_name = 'treatment') AS score_treatment
  FROM your_table
  GROUP BY device

What is the best way to do this?

答案1

得分: 1

如果我理解你的问题,我认为你已经非常接近解决方案了。请注意,如果你将一个整数除以一个整数,你将得到一个整数结果,而你需要小数。我建议如下操作:

SELECT
      t.device AS 属性
    , COUNT(CASE WHEN t.group_name = 'control' THEN 1 END) AS 控制组样本数
    , COUNT(CASE WHEN t.group_name = 'treatment' THEN 1 END) AS 治疗组样本数
    , COUNT(CASE WHEN t.group_name = 'control' THEN 1 END) / MAX(g.group_count) AS 控制组得分
    , COUNT(CASE WHEN t.group_name = 'treatment' THEN 1 END) / MAX(g.group_count) AS 治疗组得分
FROM your_table AS t
INNER JOIN (
    SELECT
          group_name
        , count(*) * 1.0 group_count -- 通过 * 1.0 转换为小数
    FROM your_table
    GROUP BY
          group_name
    ) AS g ON t.group_name = g.group_name
GROUP BY
        t.device
属性 控制组样本数 治疗组样本数 控制组得分 治疗组得分
android 1 2 0.33333333333333333333 0.66666666666666666667
ios 2 1 0.66666666666666666667 0.33333333333333333333

fiddle

英文:

If I understand your question I think you were quite close to the solution. Please note that if you divide an integer by an integer you will get an integer result, and you need decimals. I suggest the following:

SELECT
      t.device AS attribute
    , COUNT(CASE WHEN t.group_name = 'control' THEN 1 END) AS control_sample
    , COUNT(CASE WHEN t.group_name = 'treatment' THEN 1 END) AS treatment_sample
    , COUNT(CASE WHEN t.group_name = 'control' THEN 1 END) / MAX(g.group_count) AS score_control
    , COUNT(CASE WHEN t.group_name = 'treatment' THEN 1 END) / MAX(g.group_count) AS score_treatment
FROM your_table AS t
INNER JOIN (
    SELECT
          group_name
        , count(*) * 1.0 group_count -- "convert" to decimal by the * 1.0
    FROM your_table
    GROUP BY
          group_name
    ) AS g ON t.group_name = g.group_name
GROUP BY
        t.device
attribute control_sample treatment_sample score_control score_treatment
android 1 2 0.33333333333333333333 0.66666666666666666667
ios 2 1 0.66666666666666666667 0.33333333333333333333

fiddle

答案2

得分: 1

我建议在分组结果上使用 sum 窗口函数

-- 样本数据
with dataset(uuid, group_name, transaction_id, sales, device, is_local) as (
    values ('xyz', 'control', '123x', 11, 'ios', false),
        ('abc', 'treatment', '124x', 12, 'android', true),
        ('def', 'control', '125x', 13, 'android', false),
        ('ghi', 'treatment', '126x', 14, 'ios', false),
        ('jkl', 'control', '127x', 15, 'ios', true),
        ('mno', 'treatment', '128x', 16, 'android', false)
)

-- 查询
select attribute,
    control_sample,
    treatment_sample,
    control_sample * 1.0 / sum(control_sample) over () score_control,
    treatment_sample * 1.0 / sum(treatment_sample) over () score_treatment
from (
    select device attribute,
        count_if(group_name = 'control') control_sample,
        count_if(group_name = 'treatment') treatment_sample
    from dataset
    group by device);

输出:

属性 控制组样本数 治疗组样本数 控制组得分 治疗组得分
android 1 2 0.3 0.7
ios 2 1 0.7 0.3

但是,根据我看到的,你的查询存在一个问题,即整数除法(例如,当两个操作数都是整数类型时,999/1000 是0),所以你可以通过将其中一个操作数乘以1.0来修复它 - 例如 COUNT(CASE WHEN group_name = 'control' THEN 1 END) * 1.0/ (SELECT COUNT(*) ...) AS score_control

英文:

I would suggest to use sum window function over the grouping results:

-- sample data
with dataset(uuid, group_name, transaction_id, sales, device, is_local) as (
    values ('xyz',       'control', '123x', 11,          'ios',         false),
        ('abc',     'treatment', '124x', 12,      'android',          true),
        ('def',       'control', '125x', 13,      'android',         false),
        ('ghi',     'treatment', '126x', 14,          'ios',         false),
        ('jkl',       'control', '127x', 15,          'ios',          true),
        ('mno',     'treatment', '128x', 16,      'android',         false)
)

-- query
select attribute,
    control_sample,
    treatment_sample,
    control_sample * 1.0 / sum(control_sample) over () score_control,
    treatment_sample * 1.0 / sum(treatment_sample) over () score_treatment
from (
    select device attribute,
        count_if(group_name = 'control') control_sample,
        count_if(group_name = 'treatment') treatment_sample
    from dataset
    group by device);

Output:

attribute control_sample treatment_sample score_control score_treatment
android 1 2 0.3 0.7
ios 2 1 0.7 0.3

But the problem with your query was (as far as I can see) is that you got struck by another case of integer division (i.e. 999/1000 is 0 when both operands are of integer type), so you can fix it by multiplying one of operands by 1.0 - for example COUNT(CASE WHEN group_name = 'control' THEN 1 END) * 1.0/ (SELECT COUNT(*) ...) AS score_control

huangapple
  • 本文由 发表于 2023年7月11日 07:51:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76657960.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定