2023年7月11日 07:51:41go评论151阅读模式

英文:

Aggregate By Column in SQL That Results in Sample Size and % with Case When

问题

这是你想要的翻译结果：

SELECT
    device AS attribute,
    COUNT(CASE WHEN group_name = 'control' THEN 1 END) AS control_sample,
    COUNT(CASE WHEN group_name = 'treatment' THEN 1 END) AS treatment_sample,
    COUNT(CASE WHEN group_name = 'control' THEN 1 END) / SUM(CASE WHEN group_name = 'control' THEN 1 ELSE 0 END) OVER () AS score_control,
    COUNT(CASE WHEN group_name = 'treatment' THEN 1 END) / SUM(CASE WHEN group_name = 'treatment' THEN 1 ELSE 0 END) OVER () AS score_treatment
FROM your_table
GROUP BY device;

这是你的查询，已根据你的要求进行了修改。如果每个设备有100个样本，这个查询将计算每个组内的样本总数以及每个设备在组内的百分比。

英文:

Hi I have a table in SQL (presto) that has the following format:

uuid    group_name    transaction_id      sales       device      is_local
 xyz       control              123x         11          ios         false
 abc     treatment              124x         12      android          true
 def       control              125x         13      android         false
 ghi     treatment              126x         14          ios         false
 jkl       control              127x         15          ios          true
 mno     treatment              128x         16      android         false

how do I write a query where if there were 100 devices I can get the total sample within the group name and the percentage of each device within the group name?

I want my final table to look like this based on the device column:

attribute    control_sample    treatment_sample    score_control    score_treatment
      ios               100                  50              0.5               0.33
  android               100                  50              0.7               0.40
   tablet               100                  50              0.3               0.80
       TV               100                  50              0.6               0.20

this is the attempt originally but it did not give me the output I wanted:

device AS (
  SELECT
    device AS attribute,
    COUNT(CASE WHEN group_name = &#39;control&#39; THEN 1 END) AS control_sample,
    COUNT(CASE WHEN group_name = &#39;treatment&#39; THEN 1 END) AS treatment_sample,
    COUNT(CASE WHEN group_name = &#39;control&#39; THEN 1 END) / (SELECT COUNT(*) FROM your_table WHERE group_name = &#39;control&#39;) AS score_control,
    COUNT(CASE WHEN group_name = &#39;treatment&#39; THEN 1 END) / (SELECT COUNT(*) FROM your_table WHERE group_name = &#39;treatment&#39;) AS score_treatment
  FROM your_table
  GROUP BY device

What is the best way to do this?

答案1

得分: 1

如果我理解你的问题，我认为你已经非常接近解决方案了。请注意，如果你将一个整数除以一个整数，你将得到一个整数结果，而你需要小数。我建议如下操作：

SELECT
      t.device AS 属性
    , COUNT(CASE WHEN t.group_name = 'control' THEN 1 END) AS 控制组样本数
    , COUNT(CASE WHEN t.group_name = 'treatment' THEN 1 END) AS 治疗组样本数
    , COUNT(CASE WHEN t.group_name = 'control' THEN 1 END) / MAX(g.group_count) AS 控制组得分
    , COUNT(CASE WHEN t.group_name = 'treatment' THEN 1 END) / MAX(g.group_count) AS 治疗组得分
FROM your_table AS t
INNER JOIN (
    SELECT
          group_name
        , count(*) * 1.0 group_count -- 通过 * 1.0 转换为小数
    FROM your_table
    GROUP BY
          group_name
    ) AS g ON t.group_name = g.group_name
GROUP BY
        t.device

属性	控制组样本数	治疗组样本数	控制组得分	治疗组得分
android	1	2	0.33333333333333333333	0.66666666666666666667
ios	2	1	0.66666666666666666667	0.33333333333333333333

fiddle

英文:

If I understand your question I think you were quite close to the solution. Please note that if you divide an integer by an integer you will get an integer result, and you need decimals. I suggest the following:

SELECT
      t.device AS attribute
    , COUNT(CASE WHEN t.group_name = &#39;control&#39; THEN 1 END) AS control_sample
    , COUNT(CASE WHEN t.group_name = &#39;treatment&#39; THEN 1 END) AS treatment_sample
    , COUNT(CASE WHEN t.group_name = &#39;control&#39; THEN 1 END) / MAX(g.group_count) AS score_control
    , COUNT(CASE WHEN t.group_name = &#39;treatment&#39; THEN 1 END) / MAX(g.group_count) AS score_treatment
FROM your_table AS t
INNER JOIN (
    SELECT
          group_name
        , count(*) * 1.0 group_count -- &quot;convert&quot; to decimal by the * 1.0
    FROM your_table
    GROUP BY
          group_name
    ) AS g ON t.group_name = g.group_name
GROUP BY
        t.device

attribute	control_sample	treatment_sample	score_control	score_treatment
android	1	2	0.33333333333333333333	0.66666666666666666667
ios	2	1	0.66666666666666666667	0.33333333333333333333

fiddle

答案2

得分: 1

我建议在分组结果上使用 sum 窗口函数：

-- 样本数据
with dataset(uuid, group_name, transaction_id, sales, device, is_local) as (
    values ('xyz', 'control', '123x', 11, 'ios', false),
        ('abc', 'treatment', '124x', 12, 'android', true),
        ('def', 'control', '125x', 13, 'android', false),
        ('ghi', 'treatment', '126x', 14, 'ios', false),
        ('jkl', 'control', '127x', 15, 'ios', true),
        ('mno', 'treatment', '128x', 16, 'android', false)
)
-- 查询
select attribute,
    control_sample,
    treatment_sample,
    control_sample * 1.0 / sum(control_sample) over () score_control,
    treatment_sample * 1.0 / sum(treatment_sample) over () score_treatment
from (
    select device attribute,
        count_if(group_name = 'control') control_sample,
        count_if(group_name = 'treatment') treatment_sample
    from dataset
    group by device);

输出:

属性	控制组样本数	治疗组样本数	控制组得分	治疗组得分
android	1	2	0.3	0.7
ios	2	1	0.7	0.3

但是，根据我看到的，你的查询存在一个问题，即整数除法（例如，当两个操作数都是整数类型时，999/1000 是0），所以你可以通过将其中一个操作数乘以1.0来修复它 - 例如 COUNT(CASE WHEN group_name = 'control' THEN 1 END) * 1.0/ (SELECT COUNT(*) ...) AS score_control

英文:

I would suggest to use sum window function over the grouping results:

-- sample data
with dataset(uuid, group_name, transaction_id, sales, device, is_local) as (
    values (&#39;xyz&#39;,       &#39;control&#39;, &#39;123x&#39;, 11,          &#39;ios&#39;,         false),
        (&#39;abc&#39;,     &#39;treatment&#39;, &#39;124x&#39;, 12,      &#39;android&#39;,          true),
        (&#39;def&#39;,       &#39;control&#39;, &#39;125x&#39;, 13,      &#39;android&#39;,         false),
        (&#39;ghi&#39;,     &#39;treatment&#39;, &#39;126x&#39;, 14,          &#39;ios&#39;,         false),
        (&#39;jkl&#39;,       &#39;control&#39;, &#39;127x&#39;, 15,          &#39;ios&#39;,          true),
        (&#39;mno&#39;,     &#39;treatment&#39;, &#39;128x&#39;, 16,      &#39;android&#39;,         false)
)
-- query
select attribute,
    control_sample,
    treatment_sample,
    control_sample * 1.0 / sum(control_sample) over () score_control,
    treatment_sample * 1.0 / sum(treatment_sample) over () score_treatment
from (
    select device attribute,
        count_if(group_name = &#39;control&#39;) control_sample,
        count_if(group_name = &#39;treatment&#39;) treatment_sample
    from dataset
    group by device);

Output:

attribute	control_sample	treatment_sample	score_control	score_treatment
android	1	2	0.3	0.7
ios	2	1	0.7	0.3

But the problem with your query was (as far as I can see) is that you got struck by another case of integer division (i.e. 999/1000 is 0 when both operands are of integer type), so you can fix it by multiplying one of operands by 1.0 - for example COUNT(CASE WHEN group_name = 'control' THEN 1 END) * 1.0/ (SELECT COUNT(*) ...) AS score_control

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在SQL中按列进行聚合，以获得样本大小和百分比，使用CASE WHEN。

问题

答案1

答案2

Oracle PLSQL长时间运行问题

对同一张表格的多个字段进行排序

T-SQL如何提取记录的最大值

如何查找以”INPUT”开头并包含X个字符实例的Postgres列中的值？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。