计算每个组中相同特定值的百分比。

huangapple go评论76阅读模式
英文:

Calculate percentage of same specific values per group

问题

我有以下的数据框 df(以下是`dput`):

    > df
       group class value
    1      A FALSE     2
    2      A  TRUE     1
    3      A FALSE     1
    4      A FALSE     2
    5      A FALSE     3
    6      B FALSE     4
    7      B FALSE     2
    8      B  TRUE     2
    9      B FALSE     2
    10     B FALSE     6
    11     C  TRUE     5
    12     C FALSE     5
    13     C FALSE     3
    14     C FALSE     5
    15     C FALSE     5

我想要计算每个组中特定相似值的百分比。每个组中始终有一个`class == TRUE`的值,这意味着我想要计算与`class == TRUE`值相似的值的百分比。如上面的数据框中所示,组A有`class == TRUE`的值为1,并且组A中有两个值为1,因此2/5 = 0.4的值为1。以下是所需的输出:

      group value pct
    1     A     1 0.4
    2     B     2 0.6
    3     C     5 0.8

所以我想知道是否有人知道如何在R中计算每个组中特定值的百分比?

***

`dput` 的 df:

    df <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B", 
    "B", "B", "C", "C", "C", "C", "C"), class = c(FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE), value = c(2, 1, 1, 2, 3, 4, 2, 2, 2, 6, 
    5, 5, 3, 5, 5)), class = "data.frame", row.names = c(NA, -15L
    ))
英文:

I have the following dataframe df (dput below):

&gt; df
   group class value
1      A FALSE     2
2      A  TRUE     1
3      A FALSE     1
4      A FALSE     2
5      A FALSE     3
6      B FALSE     4
7      B FALSE     2
8      B  TRUE     2
9      B FALSE     2
10     B FALSE     6
11     C  TRUE     5
12     C FALSE     5
13     C FALSE     3
14     C FALSE     5
15     C FALSE     5

I would like to calculate the percentage of specific similar values per group. There is always one value with the class == TRUE, which means I would like to calculate the percentage of values similar to the value with class == TRUE. As you can see in the dataframe above group A has the value 1 with class == TRUE and there are two values with 1 in group A so 2/5 = 0.4 of the values are 1. Here is the desired output:

  group value pct
1     A     1 0.4
2     B     2 0.6
3     C     5 0.8

So I was wondering if anyone knows how to calculate the percentage of specific values per group in R?


dput of df:

df &lt;- structure(list(group = c(&quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;B&quot;, 
&quot;B&quot;, &quot;B&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;), class = c(FALSE, TRUE, FALSE, 
FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, 
FALSE, FALSE, FALSE), value = c(2, 1, 1, 2, 3, 4, 2, 2, 2, 6, 
5, 5, 3, 5, 5)), class = &quot;data.frame&quot;, row.names = c(NA, -15L
))

答案1

得分: 3

你可以这样做:

library(dplyr)
df %>%
  group_by(group) %>%
  summarize(pct = sum(value == value[class == TRUE])/n(),
            value = value[class == TRUE])

# 生成一个 tibble 表格:
#   group   pct value
#   <chr> <dbl> <dbl>
# 1 A       0.4     1
# 2 B       0.6     2
# 3 C       0.8     5
英文:

You could do:

library(dplyr)
df %&gt;%
  group_by(group) %&gt;%
  summarize(pct = sum(value == value[class == TRUE])/n(),
            value = value[class == TRUE])

# A tibble: 3 x 3
  group   pct value
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
1 A       0.4     1
2 B       0.6     2
3 C       0.8     5

答案2

得分: 3

尝试

library(dplyr) # 版本 >= 1.10
df %>%
   reframe(pct = mean(value == value[class]), value = value[class], .by = group)
  • 输出
   group pct value
1     A 0.4     1
2     B 0.6     2
3     C 0.8     5

或者使用 data.table 选项

library(data.table)
setDT(df)[df[(class)], .(value = i.value, pct = mean(value == i.value)), on = .(group), by = .EACHI]
   group value pct
1:     A     1 0.4
2:     B     2 0.6
3:     C     5 0.8
英文:

Try

library(dplyr)#version &gt;= 1.10
df %&gt;%
   reframe(pct = mean(value == value[class]), value = value[class], .by = group)

-output

   group pct value
1     A 0.4     1
2     B 0.6     2
3     C 0.8     5

Or with a data.table option

library(data.table)
setDT(df)[df[(class)], .(value = i.value,
  pct = mean(value == i.value)), on = .(group), by = .EACHI]
   group value pct
1:     A     1 0.4
2:     B     2 0.6
3:     C     5 0.8


</details>



# 答案3
**得分**: 3

使用`ave`和`subset`的基本R选项:

```R
subset(
  transform(
    df,
    pct = ave(ave(class, group, value) > 0, group)
  ), 
  class
)

得到结果:

   group class value pct
2      A  TRUE     1 0.4
8      B  TRUE     2 0.6
11     C  TRUE     5 0.8

请注意,我只翻译了代码部分,没有包括附加的信息或回答。

英文:

A base R option with ave + subset

subset(
  transform(
    df,
    pct = ave(ave(class, group, value) &gt; 0, group)
  ), 
  class
)

gives

   group class value pct
2      A  TRUE     1 0.4
8      B  TRUE     2 0.6
11     C  TRUE     5 0.8

huangapple
  • 本文由 发表于 2023年3月31日 23:31:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75900281.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定