计算每个组中相同特定值的百分比。

huangapple go评论96阅读模式
英文:

Calculate percentage of same specific values per group

问题

  1. 我有以下的数据框 df(以下是`dput`):
  2. > df
  3. group class value
  4. 1 A FALSE 2
  5. 2 A TRUE 1
  6. 3 A FALSE 1
  7. 4 A FALSE 2
  8. 5 A FALSE 3
  9. 6 B FALSE 4
  10. 7 B FALSE 2
  11. 8 B TRUE 2
  12. 9 B FALSE 2
  13. 10 B FALSE 6
  14. 11 C TRUE 5
  15. 12 C FALSE 5
  16. 13 C FALSE 3
  17. 14 C FALSE 5
  18. 15 C FALSE 5
  19. 我想要计算每个组中特定相似值的百分比。每个组中始终有一个`class == TRUE`的值,这意味着我想要计算与`class == TRUE`值相似的值的百分比。如上面的数据框中所示,组A`class == TRUE`的值为1,并且组A中有两个值为1,因此2/5 = 0.4的值为1。以下是所需的输出:
  20. group value pct
  21. 1 A 1 0.4
  22. 2 B 2 0.6
  23. 3 C 5 0.8
  24. 所以我想知道是否有人知道如何在R中计算每个组中特定值的百分比?
  25. ***
  26. `dput` df
  27. df <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
  28. "B", "B", "C", "C", "C", "C", "C"), class = c(FALSE, TRUE, FALSE,
  29. FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE,
  30. FALSE, FALSE, FALSE), value = c(2, 1, 1, 2, 3, 4, 2, 2, 2, 6,
  31. 5, 5, 3, 5, 5)), class = "data.frame", row.names = c(NA, -15L
  32. ))
英文:

I have the following dataframe df (dput below):

  1. &gt; df
  2. group class value
  3. 1 A FALSE 2
  4. 2 A TRUE 1
  5. 3 A FALSE 1
  6. 4 A FALSE 2
  7. 5 A FALSE 3
  8. 6 B FALSE 4
  9. 7 B FALSE 2
  10. 8 B TRUE 2
  11. 9 B FALSE 2
  12. 10 B FALSE 6
  13. 11 C TRUE 5
  14. 12 C FALSE 5
  15. 13 C FALSE 3
  16. 14 C FALSE 5
  17. 15 C FALSE 5

I would like to calculate the percentage of specific similar values per group. There is always one value with the class == TRUE, which means I would like to calculate the percentage of values similar to the value with class == TRUE. As you can see in the dataframe above group A has the value 1 with class == TRUE and there are two values with 1 in group A so 2/5 = 0.4 of the values are 1. Here is the desired output:

  1. group value pct
  2. 1 A 1 0.4
  3. 2 B 2 0.6
  4. 3 C 5 0.8

So I was wondering if anyone knows how to calculate the percentage of specific values per group in R?


dput of df:

  1. df &lt;- structure(list(group = c(&quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;B&quot;,
  2. &quot;B&quot;, &quot;B&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;), class = c(FALSE, TRUE, FALSE,
  3. FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE,
  4. FALSE, FALSE, FALSE), value = c(2, 1, 1, 2, 3, 4, 2, 2, 2, 6,
  5. 5, 5, 3, 5, 5)), class = &quot;data.frame&quot;, row.names = c(NA, -15L
  6. ))

答案1

得分: 3

你可以这样做:

  1. library(dplyr)
  2. df %>%
  3. group_by(group) %>%
  4. summarize(pct = sum(value == value[class == TRUE])/n(),
  5. value = value[class == TRUE])
  6. # 生成一个 tibble 表格:
  7. # group pct value
  8. # <chr> <dbl> <dbl>
  9. # 1 A 0.4 1
  10. # 2 B 0.6 2
  11. # 3 C 0.8 5
英文:

You could do:

  1. library(dplyr)
  2. df %&gt;%
  3. group_by(group) %&gt;%
  4. summarize(pct = sum(value == value[class == TRUE])/n(),
  5. value = value[class == TRUE])
  6. # A tibble: 3 x 3
  7. group pct value
  8. &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
  9. 1 A 0.4 1
  10. 2 B 0.6 2
  11. 3 C 0.8 5

答案2

得分: 3

尝试

  1. library(dplyr) # 版本 >= 1.10
  2. df %>%
  3. reframe(pct = mean(value == value[class]), value = value[class], .by = group)
  • 输出
  1. group pct value
  2. 1 A 0.4 1
  3. 2 B 0.6 2
  4. 3 C 0.8 5

或者使用 data.table 选项

  1. library(data.table)
  2. setDT(df)[df[(class)], .(value = i.value, pct = mean(value == i.value)), on = .(group), by = .EACHI]
  3. group value pct
  4. 1: A 1 0.4
  5. 2: B 2 0.6
  6. 3: C 5 0.8
英文:

Try

  1. library(dplyr)#version &gt;= 1.10
  2. df %&gt;%
  3. reframe(pct = mean(value == value[class]), value = value[class], .by = group)

-output

  1. group pct value
  2. 1 A 0.4 1
  3. 2 B 0.6 2
  4. 3 C 0.8 5

Or with a data.table option

  1. library(data.table)
  2. setDT(df)[df[(class)], .(value = i.value,
  3. pct = mean(value == i.value)), on = .(group), by = .EACHI]
  4. group value pct
  5. 1: A 1 0.4
  6. 2: B 2 0.6
  7. 3: C 5 0.8
  8. </details>
  9. # 答案3
  10. **得分**: 3
  11. 使用`ave``subset`的基本R选项:
  12. ```R
  13. subset(
  14. transform(
  15. df,
  16. pct = ave(ave(class, group, value) > 0, group)
  17. ),
  18. class
  19. )

得到结果:

  1. group class value pct
  2. 2 A TRUE 1 0.4
  3. 8 B TRUE 2 0.6
  4. 11 C TRUE 5 0.8

请注意,我只翻译了代码部分,没有包括附加的信息或回答。

英文:

A base R option with ave + subset

  1. subset(
  2. transform(
  3. df,
  4. pct = ave(ave(class, group, value) &gt; 0, group)
  5. ),
  6. class
  7. )

gives

  1. group class value pct
  2. 2 A TRUE 1 0.4
  3. 8 B TRUE 2 0.6
  4. 11 C TRUE 5 0.8

huangapple
  • 本文由 发表于 2023年3月31日 23:31:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75900281.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定