英文:
Aggregate By Column in SQL That Results in Sample Size and % with Case When
问题
这是你想要的翻译结果:
SELECT
device AS attribute,
COUNT(CASE WHEN group_name = 'control' THEN 1 END) AS control_sample,
COUNT(CASE WHEN group_name = 'treatment' THEN 1 END) AS treatment_sample,
COUNT(CASE WHEN group_name = 'control' THEN 1 END) / SUM(CASE WHEN group_name = 'control' THEN 1 ELSE 0 END) OVER () AS score_control,
COUNT(CASE WHEN group_name = 'treatment' THEN 1 END) / SUM(CASE WHEN group_name = 'treatment' THEN 1 ELSE 0 END) OVER () AS score_treatment
FROM your_table
GROUP BY device;
这是你的查询,已根据你的要求进行了修改。如果每个设备有100个样本,这个查询将计算每个组内的样本总数以及每个设备在组内的百分比。
英文:
Hi I have a table in SQL (presto) that has the following format:
uuid group_name transaction_id sales device is_local
xyz control 123x 11 ios false
abc treatment 124x 12 android true
def control 125x 13 android false
ghi treatment 126x 14 ios false
jkl control 127x 15 ios true
mno treatment 128x 16 android false
how do I write a query where if there were 100 devices I can get the total sample within the group name and the percentage of each device within the group name?
I want my final table to look like this based on the device column:
attribute control_sample treatment_sample score_control score_treatment
ios 100 50 0.5 0.33
android 100 50 0.7 0.40
tablet 100 50 0.3 0.80
TV 100 50 0.6 0.20
this is the attempt originally but it did not give me the output I wanted:
device AS (
SELECT
device AS attribute,
COUNT(CASE WHEN group_name = 'control' THEN 1 END) AS control_sample,
COUNT(CASE WHEN group_name = 'treatment' THEN 1 END) AS treatment_sample,
COUNT(CASE WHEN group_name = 'control' THEN 1 END) / (SELECT COUNT(*) FROM your_table WHERE group_name = 'control') AS score_control,
COUNT(CASE WHEN group_name = 'treatment' THEN 1 END) / (SELECT COUNT(*) FROM your_table WHERE group_name = 'treatment') AS score_treatment
FROM your_table
GROUP BY device
What is the best way to do this?
答案1
得分: 1
如果我理解你的问题,我认为你已经非常接近解决方案了。请注意,如果你将一个整数除以一个整数,你将得到一个整数结果,而你需要小数。我建议如下操作:
SELECT
t.device AS 属性
, COUNT(CASE WHEN t.group_name = 'control' THEN 1 END) AS 控制组样本数
, COUNT(CASE WHEN t.group_name = 'treatment' THEN 1 END) AS 治疗组样本数
, COUNT(CASE WHEN t.group_name = 'control' THEN 1 END) / MAX(g.group_count) AS 控制组得分
, COUNT(CASE WHEN t.group_name = 'treatment' THEN 1 END) / MAX(g.group_count) AS 治疗组得分
FROM your_table AS t
INNER JOIN (
SELECT
group_name
, count(*) * 1.0 group_count -- 通过 * 1.0 转换为小数
FROM your_table
GROUP BY
group_name
) AS g ON t.group_name = g.group_name
GROUP BY
t.device
属性 | 控制组样本数 | 治疗组样本数 | 控制组得分 | 治疗组得分 |
---|---|---|---|---|
android | 1 | 2 | 0.33333333333333333333 | 0.66666666666666666667 |
ios | 2 | 1 | 0.66666666666666666667 | 0.33333333333333333333 |
英文:
If I understand your question I think you were quite close to the solution. Please note that if you divide an integer by an integer you will get an integer result, and you need decimals. I suggest the following:
SELECT
t.device AS attribute
, COUNT(CASE WHEN t.group_name = 'control' THEN 1 END) AS control_sample
, COUNT(CASE WHEN t.group_name = 'treatment' THEN 1 END) AS treatment_sample
, COUNT(CASE WHEN t.group_name = 'control' THEN 1 END) / MAX(g.group_count) AS score_control
, COUNT(CASE WHEN t.group_name = 'treatment' THEN 1 END) / MAX(g.group_count) AS score_treatment
FROM your_table AS t
INNER JOIN (
SELECT
group_name
, count(*) * 1.0 group_count -- "convert" to decimal by the * 1.0
FROM your_table
GROUP BY
group_name
) AS g ON t.group_name = g.group_name
GROUP BY
t.device
attribute | control_sample | treatment_sample | score_control | score_treatment |
---|---|---|---|---|
android | 1 | 2 | 0.33333333333333333333 | 0.66666666666666666667 |
ios | 2 | 1 | 0.66666666666666666667 | 0.33333333333333333333 |
答案2
得分: 1
我建议在分组结果上使用 sum
窗口函数:
-- 样本数据
with dataset(uuid, group_name, transaction_id, sales, device, is_local) as (
values ('xyz', 'control', '123x', 11, 'ios', false),
('abc', 'treatment', '124x', 12, 'android', true),
('def', 'control', '125x', 13, 'android', false),
('ghi', 'treatment', '126x', 14, 'ios', false),
('jkl', 'control', '127x', 15, 'ios', true),
('mno', 'treatment', '128x', 16, 'android', false)
)
-- 查询
select attribute,
control_sample,
treatment_sample,
control_sample * 1.0 / sum(control_sample) over () score_control,
treatment_sample * 1.0 / sum(treatment_sample) over () score_treatment
from (
select device attribute,
count_if(group_name = 'control') control_sample,
count_if(group_name = 'treatment') treatment_sample
from dataset
group by device);
输出:
属性 | 控制组样本数 | 治疗组样本数 | 控制组得分 | 治疗组得分 |
---|---|---|---|---|
android | 1 | 2 | 0.3 | 0.7 |
ios | 2 | 1 | 0.7 | 0.3 |
但是,根据我看到的,你的查询存在一个问题,即整数除法(例如,当两个操作数都是整数类型时,999/1000 是0),所以你可以通过将其中一个操作数乘以1.0来修复它 - 例如 COUNT(CASE WHEN group_name = 'control' THEN 1 END) * 1.0/ (SELECT COUNT(*) ...) AS score_control
英文:
I would suggest to use sum
window function over the grouping results:
-- sample data
with dataset(uuid, group_name, transaction_id, sales, device, is_local) as (
values ('xyz', 'control', '123x', 11, 'ios', false),
('abc', 'treatment', '124x', 12, 'android', true),
('def', 'control', '125x', 13, 'android', false),
('ghi', 'treatment', '126x', 14, 'ios', false),
('jkl', 'control', '127x', 15, 'ios', true),
('mno', 'treatment', '128x', 16, 'android', false)
)
-- query
select attribute,
control_sample,
treatment_sample,
control_sample * 1.0 / sum(control_sample) over () score_control,
treatment_sample * 1.0 / sum(treatment_sample) over () score_treatment
from (
select device attribute,
count_if(group_name = 'control') control_sample,
count_if(group_name = 'treatment') treatment_sample
from dataset
group by device);
Output:
attribute | control_sample | treatment_sample | score_control | score_treatment |
---|---|---|---|---|
android | 1 | 2 | 0.3 | 0.7 |
ios | 2 | 1 | 0.7 | 0.3 |
But the problem with your query was (as far as I can see) is that you got struck by another case of integer division (i.e. 999/1000 is 0 when both operands are of integer type), so you can fix it by multiplying one of operands by 1.0 - for example COUNT(CASE WHEN group_name = 'control' THEN 1 END) * 1.0/ (SELECT COUNT(*) ...) AS score_control
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论