2023年7月18日 16:52:55go评论97阅读模式

英文:

How to calculate percentage for each group depends on the different variable?

问题

以下是代码的翻译部分：

这是虚拟数据集的R代码：

c <- c(10, 20, 30, 40, 50, 40, 2, 40, 10, 50)
b <- c(40, 2, 40, 10, 50, 10, 20, 30, 40, 50)
a <- c(10, 50, 3, 60, 100,40, 2, 40, 10, 50)
id <- c("a", "b", "b", "a", "c", "a", "b", "b", "a", "c")
variation <- c("a3", "a3", "b1", "a2", "b1","a3", "a1", "b1", "a1", "b1")
data <- data.frame(id, a, b, c, variation)
head(data)
#    id   a  b  c variation
# 1   a  10 40 10        a3
# 2   b  50  2 20        a3
# 3   b   3 40 30        b1
# 4   a  60 10 40        a2
# 5   c 100 50 50        b1
# 6   a  40 10 40        a3
# 7   b   2 20  2        a1
# 8   b  40 30 40        b1
# 9   a  10 40 10        a1
# 10  c  50 50 50        b1

我可以为每个单独的id进行百分比计算，过滤后如下：

data_filter <- data %>% filter(id == "a")
data_filter
#   id  a  b  c variation
# 1  a 10 40 10        a3
# 2  a 60 10 40        a2
# 3  a 40 10 40        a3
# 4  a 10 40 10        a1
# 数据转换
data_filter_percentage <- data_filter %>%
  group_by(variation) %>%
  count() %>%
  ungroup() %>%
  mutate(perc = `n` / sum(`n`)) %>%
  arrange(perc) %>%
  mutate(labels = scales::percent(perc))
head(data_filter_percentage)
# A tibble: 3 x 4
#   variation     n  perc labels
#   <chr>     <int> <dbl> <chr> 
# 1 a1            1  0.25 25%   
# 2 a2            1  0.25 25%   
# 3 a3            2  0.5  50%

然而，我的问题是，是否可以对所有"id"执行上述管道而无需单独过滤？

英文:

This the dummy dataset R code:

c &lt;- c(10, 20, 30, 40, 50, 40, 2, 40, 10, 50)
b &lt;- c(40, 2, 40, 10, 50, 10, 20, 30, 40, 50)
a &lt;- c(10, 50, 3, 60, 100,40, 2, 40, 10, 50)
id &lt;- c(&quot;a&quot;, &quot;b&quot;, &quot;b&quot;, &quot;a&quot;, &quot;c&quot;, &quot;a&quot;, &quot;b&quot;, &quot;b&quot;, &quot;a&quot;, &quot;c&quot;)
variation &lt;- c(&quot;a3&quot;, &quot;a3&quot;, &quot;b1&quot;, &quot;a2&quot;, &quot;b1&quot;,&quot;a3&quot;, &quot;a1&quot;, &quot;b1&quot;, &quot;a1&quot;, &quot;b1&quot; )
data &lt;- data.frame(id, a, b, c, variation)
head(data)
#    id   a  b  c variation
# 1   a  10 40 10        a3
# 2   b  50  2 20        a3
# 3   b   3 40 30        b1
# 4   a  60 10 40        a2
# 5   c 100 50 50        b1
# 6   a  40 10 40        a3
# 7   b   2 20  2        a1
# 8   b  40 30 40        b1
# 9   a  10 40 10        a1
# 10  c  50 50 50        b1

I can calculate percentages for individual id after filtering:

data_filter &lt;- data %&gt;% filter(id == &quot;a&quot;)
data_filter
#   id  a  b  c variation
# 1  a 10 40 10        a3
# 2  a 60 10 40        a2
# 3  a 40 10 40        a3
# 4  a 10 40 10        a1
# Data transformation
data_filter_percentage &lt;- data_filter %&gt;% 
  group_by(variation) %&gt;% # Variable to be transformed
  count() %&gt;% 
  ungroup() %&gt;% 
  mutate(perc = `n` / sum(`n`)) %&gt;% 
  arrange(perc) %&gt;%
  mutate(labels = scales::percent(perc))
head(data_filter_percentage)
# A tibble: 3 x 4
#   variation     n  perc labels
#   &lt;chr&gt;     &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; 
# 1 a1            1  0.25 25%   
# 2 a2            1  0.25 25%   
# 3 a3            2  0.5  50%

However, my question is, Is it possible to perform above pipeline for all "id" without filtering individually?

答案1

得分: 1

以下是翻译好的代码部分：

library(dplyr)
data %>%
  group_by(id) %>%
  count(variation) %>%
  mutate(perc = n / sum(n), labels = scales::percent(perc)) %>%
  ungroup()

Briefly,

data %>%
  count(id, variation) %>%
  mutate(perc = n / sum(n), labels = scales::percent(perc), .by = id)
# # A tibble: 7 × 5
#   id    variation     n  perc labels
#   <chr> <chr>     <int> <dbl> <chr> 
# 1 a     a1            1  0.25 25%   
# 2 a     a2            1  0.25 25%   
# 3 a     a3            2  0.5  50%   
# 4 b     a1            1  0.25 25%   
# 5 b     a3            1  0.25 25%   
# 6 b     b1            2  0.5  50%   
# 7 c     b1            2  1    100%

英文:

You can try the following workflow:

library(dplyr)
data %&gt;%
  group_by(id) %&gt;%
  count(variation) %&gt;%
  mutate(perc = n / sum(n), labels = scales::percent(perc)) %&gt;%
  ungroup()

Briefly,

data %&gt;%
  count(id, variation) %&gt;%
  mutate(perc = n / sum(n), labels = scales::percent(perc), .by = id)
# # A tibble: 7 &#215; 5
#   id    variation     n  perc labels
#   &lt;chr&gt; &lt;chr&gt;     &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; 
# 1 a     a1            1  0.25 25%   
# 2 a     a2            1  0.25 25%   
# 3 a     a3            2  0.5  50%   
# 4 b     a1            1  0.25 25%   
# 5 b     a3            1  0.25 25%   
# 6 b     b1            2  0.5  50%   
# 7 c     b1            2  1    100%

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何计算每个组的百分比取决于不同的变量？

问题

答案1

quantmod的替代品用于买卖信息

在生成文档目录之前添加执行摘要，并将其编译成微软Word格式。

从数据框中提取引号内的字母字符串

如果列不存在，则在嵌套的tibble中创建列。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。