英文:
Calculate percentage of same specific values per group
问题
我有以下的数据框 df(以下是`dput`):
    > df
       group class value
    1      A FALSE     2
    2      A  TRUE     1
    3      A FALSE     1
    4      A FALSE     2
    5      A FALSE     3
    6      B FALSE     4
    7      B FALSE     2
    8      B  TRUE     2
    9      B FALSE     2
    10     B FALSE     6
    11     C  TRUE     5
    12     C FALSE     5
    13     C FALSE     3
    14     C FALSE     5
    15     C FALSE     5
我想要计算每个组中特定相似值的百分比。每个组中始终有一个`class == TRUE`的值,这意味着我想要计算与`class == TRUE`值相似的值的百分比。如上面的数据框中所示,组A有`class == TRUE`的值为1,并且组A中有两个值为1,因此2/5 = 0.4的值为1。以下是所需的输出:
      group value pct
    1     A     1 0.4
    2     B     2 0.6
    3     C     5 0.8
所以我想知道是否有人知道如何在R中计算每个组中特定值的百分比?
***
`dput` 的 df:
    df <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B", 
    "B", "B", "C", "C", "C", "C", "C"), class = c(FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE), value = c(2, 1, 1, 2, 3, 4, 2, 2, 2, 6, 
    5, 5, 3, 5, 5)), class = "data.frame", row.names = c(NA, -15L
    ))
英文:
I have the following dataframe df (dput below):
> df
   group class value
1      A FALSE     2
2      A  TRUE     1
3      A FALSE     1
4      A FALSE     2
5      A FALSE     3
6      B FALSE     4
7      B FALSE     2
8      B  TRUE     2
9      B FALSE     2
10     B FALSE     6
11     C  TRUE     5
12     C FALSE     5
13     C FALSE     3
14     C FALSE     5
15     C FALSE     5
I would like to calculate the percentage of specific similar values per group. There is always one value with the class == TRUE, which means I would like to calculate the percentage of values similar to the value with class == TRUE. As you can see in the dataframe above group A has the value 1 with class == TRUE and there are two values with 1 in group A so 2/5 = 0.4 of the values are 1. Here is the desired output:
  group value pct
1     A     1 0.4
2     B     2 0.6
3     C     5 0.8
So I was wondering if anyone knows how to calculate the percentage of specific values per group in R?
dput of df:
df <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B", 
"B", "B", "C", "C", "C", "C", "C"), class = c(FALSE, TRUE, FALSE, 
FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, 
FALSE, FALSE, FALSE), value = c(2, 1, 1, 2, 3, 4, 2, 2, 2, 6, 
5, 5, 3, 5, 5)), class = "data.frame", row.names = c(NA, -15L
))
答案1
得分: 3
你可以这样做:
library(dplyr)
df %>%
  group_by(group) %>%
  summarize(pct = sum(value == value[class == TRUE])/n(),
            value = value[class == TRUE])
# 生成一个 tibble 表格:
#   group   pct value
#   <chr> <dbl> <dbl>
# 1 A       0.4     1
# 2 B       0.6     2
# 3 C       0.8     5
英文:
You could do:
library(dplyr)
df %>%
  group_by(group) %>%
  summarize(pct = sum(value == value[class == TRUE])/n(),
            value = value[class == TRUE])
# A tibble: 3 x 3
  group   pct value
  <chr> <dbl> <dbl>
1 A       0.4     1
2 B       0.6     2
3 C       0.8     5
答案2
得分: 3
尝试
library(dplyr) # 版本 >= 1.10
df %>%
   reframe(pct = mean(value == value[class]), value = value[class], .by = group)
- 输出
 
   group pct value
1     A 0.4     1
2     B 0.6     2
3     C 0.8     5
或者使用 data.table 选项
library(data.table)
setDT(df)[df[(class)], .(value = i.value, pct = mean(value == i.value)), on = .(group), by = .EACHI]
   group value pct
1:     A     1 0.4
2:     B     2 0.6
3:     C     5 0.8
英文:
Try
library(dplyr)#version >= 1.10
df %>%
   reframe(pct = mean(value == value[class]), value = value[class], .by = group)
-output
   group pct value
1     A 0.4     1
2     B 0.6     2
3     C 0.8     5
Or with a data.table option
library(data.table)
setDT(df)[df[(class)], .(value = i.value,
  pct = mean(value == i.value)), on = .(group), by = .EACHI]
   group value pct
1:     A     1 0.4
2:     B     2 0.6
3:     C     5 0.8
</details>
# 答案3
**得分**: 3
使用`ave`和`subset`的基本R选项:
```R
subset(
  transform(
    df,
    pct = ave(ave(class, group, value) > 0, group)
  ), 
  class
)
得到结果:
   group class value pct
2      A  TRUE     1 0.4
8      B  TRUE     2 0.6
11     C  TRUE     5 0.8
请注意,我只翻译了代码部分,没有包括附加的信息或回答。
英文:
A base R option with ave + subset
subset(
  transform(
    df,
    pct = ave(ave(class, group, value) > 0, group)
  ), 
  class
)
gives
   group class value pct
2      A  TRUE     1 0.4
8      B  TRUE     2 0.6
11     C  TRUE     5 0.8
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论