2023年3月31日 23:31:33go评论96阅读模式

英文:

Calculate percentage of same specific values per group

问题

我有以下的数据框 df（以下是`dput`）：
    > df
       group class value
    1      A FALSE     2
    2      A  TRUE     1
    3      A FALSE     1
    4      A FALSE     2
    5      A FALSE     3
    6      B FALSE     4
    7      B FALSE     2
    8      B  TRUE     2
    9      B FALSE     2
    10     B FALSE     6
    11     C  TRUE     5
    12     C FALSE     5
    13     C FALSE     3
    14     C FALSE     5
    15     C FALSE     5
我想要计算每个组中特定相似值的百分比。每个组中始终有一个`class == TRUE`的值，这意味着我想要计算与`class == TRUE`值相似的值的百分比。如上面的数据框中所示，组A有`class == TRUE`的值为1，并且组A中有两个值为1，因此2/5 = 0.4的值为1。以下是所需的输出：
      group value pct
    1     A     1 0.4
    2     B     2 0.6
    3     C     5 0.8
所以我想知道是否有人知道如何在R中计算每个组中特定值的百分比？
***
`dput` 的 df：
    df <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B", 
    "B", "B", "C", "C", "C", "C", "C"), class = c(FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE), value = c(2, 1, 1, 2, 3, 4, 2, 2, 2, 6, 
    5, 5, 3, 5, 5)), class = "data.frame", row.names = c(NA, -15L
    ))

英文:

I have the following dataframe df (dput below):

&gt; df
   group class value
1      A FALSE     2
2      A  TRUE     1
3      A FALSE     1
4      A FALSE     2
5      A FALSE     3
6      B FALSE     4
7      B FALSE     2
8      B  TRUE     2
9      B FALSE     2
10     B FALSE     6
11     C  TRUE     5
12     C FALSE     5
13     C FALSE     3
14     C FALSE     5
15     C FALSE     5

I would like to calculate the percentage of specific similar values per group. There is always one value with the class == TRUE, which means I would like to calculate the percentage of values similar to the value with class == TRUE. As you can see in the dataframe above group A has the value 1 with class == TRUE and there are two values with 1 in group A so 2/5 = 0.4 of the values are 1. Here is the desired output:

  group value pct
1     A     1 0.4
2     B     2 0.6
3     C     5 0.8

So I was wondering if anyone knows how to calculate the percentage of specific values per group in R?

dput of df:

df &lt;- structure(list(group = c(&quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;B&quot;, 
&quot;B&quot;, &quot;B&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;), class = c(FALSE, TRUE, FALSE, 
FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, 
FALSE, FALSE, FALSE), value = c(2, 1, 1, 2, 3, 4, 2, 2, 2, 6, 
5, 5, 3, 5, 5)), class = &quot;data.frame&quot;, row.names = c(NA, -15L
))

答案1

得分: 3

你可以这样做：

library(dplyr)
df %>%
  group_by(group) %>%
  summarize(pct = sum(value == value[class == TRUE])/n(),
            value = value[class == TRUE])
# 生成一个 tibble 表格：
#   group   pct value
#   <chr> <dbl> <dbl>
# 1 A       0.4     1
# 2 B       0.6     2
# 3 C       0.8     5

英文:

You could do:

library(dplyr)
df %&gt;%
  group_by(group) %&gt;%
  summarize(pct = sum(value == value[class == TRUE])/n(),
            value = value[class == TRUE])
# A tibble: 3 x 3
  group   pct value
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
1 A       0.4     1
2 B       0.6     2
3 C       0.8     5

答案2

得分: 3

尝试

library(dplyr) # 版本 >= 1.10
df %>%
   reframe(pct = mean(value == value[class]), value = value[class], .by = group)

输出

   group pct value
1     A 0.4     1
2     B 0.6     2
3     C 0.8     5

或者使用 data.table 选项

library(data.table)
setDT(df)[df[(class)], .(value = i.value, pct = mean(value == i.value)), on = .(group), by = .EACHI]
   group value pct
1:     A     1 0.4
2:     B     2 0.6
3:     C     5 0.8

英文:

Try

library(dplyr)#version &gt;= 1.10
df %&gt;%
   reframe(pct = mean(value == value[class]), value = value[class], .by = group)

-output

   group pct value
1     A 0.4     1
2     B 0.6     2
3     C 0.8     5

Or with a data.table option

library(data.table)
setDT(df)[df[(class)], .(value = i.value,
  pct = mean(value == i.value)), on = .(group), by = .EACHI]
   group value pct
1:     A     1 0.4
2:     B     2 0.6
3:     C     5 0.8
</details>
# 答案3
**得分**: 3
使用`ave`和`subset`的基本R选项：
```R
subset(
  transform(
    df,
    pct = ave(ave(class, group, value) > 0, group)
  ), 
  class
)

得到结果：

   group class value pct
2      A  TRUE     1 0.4
8      B  TRUE     2 0.6
11     C  TRUE     5 0.8

请注意，我只翻译了代码部分，没有包括附加的信息或回答。

英文:

A base R option with ave + subset

subset(
  transform(
    df,
    pct = ave(ave(class, group, value) &gt; 0, group)
  ), 
  class
)

gives

   group class value pct
2      A  TRUE     1 0.4
8      B  TRUE     2 0.6
11     C  TRUE     5 0.8

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

计算每个组中相同特定值的百分比。

问题

答案1

答案2

在R中创建一个堆叠的2 x 2 kable表格，使用不同维度的数据框。

R Shiny App：在使用shinytheme时覆盖按钮背景颜色

如何在R中将经纬度坐标转换为兰伯特等积圆锥投影

在特定行添加一个带有累积元素数量的列。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。