英文:
Calculate percentage of same specific values per group
问题
我有以下的数据框 df(以下是`dput`):
> df
group class value
1 A FALSE 2
2 A TRUE 1
3 A FALSE 1
4 A FALSE 2
5 A FALSE 3
6 B FALSE 4
7 B FALSE 2
8 B TRUE 2
9 B FALSE 2
10 B FALSE 6
11 C TRUE 5
12 C FALSE 5
13 C FALSE 3
14 C FALSE 5
15 C FALSE 5
我想要计算每个组中特定相似值的百分比。每个组中始终有一个`class == TRUE`的值,这意味着我想要计算与`class == TRUE`值相似的值的百分比。如上面的数据框中所示,组A有`class == TRUE`的值为1,并且组A中有两个值为1,因此2/5 = 0.4的值为1。以下是所需的输出:
group value pct
1 A 1 0.4
2 B 2 0.6
3 C 5 0.8
所以我想知道是否有人知道如何在R中计算每个组中特定值的百分比?
***
`dput` 的 df:
df <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B", "C", "C", "C", "C", "C"), class = c(FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE), value = c(2, 1, 1, 2, 3, 4, 2, 2, 2, 6,
5, 5, 3, 5, 5)), class = "data.frame", row.names = c(NA, -15L
))
英文:
I have the following dataframe df (dput
below):
> df
group class value
1 A FALSE 2
2 A TRUE 1
3 A FALSE 1
4 A FALSE 2
5 A FALSE 3
6 B FALSE 4
7 B FALSE 2
8 B TRUE 2
9 B FALSE 2
10 B FALSE 6
11 C TRUE 5
12 C FALSE 5
13 C FALSE 3
14 C FALSE 5
15 C FALSE 5
I would like to calculate the percentage of specific similar values per group. There is always one value with the class == TRUE
, which means I would like to calculate the percentage of values similar to the value with class == TRUE
. As you can see in the dataframe above group A has the value 1 with class == TRUE and there are two values with 1 in group A so 2/5 = 0.4 of the values are 1. Here is the desired output:
group value pct
1 A 1 0.4
2 B 2 0.6
3 C 5 0.8
So I was wondering if anyone knows how to calculate the percentage of specific values per group in R?
dput
of df:
df <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B", "C", "C", "C", "C", "C"), class = c(FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE), value = c(2, 1, 1, 2, 3, 4, 2, 2, 2, 6,
5, 5, 3, 5, 5)), class = "data.frame", row.names = c(NA, -15L
))
答案1
得分: 3
你可以这样做:
library(dplyr)
df %>%
group_by(group) %>%
summarize(pct = sum(value == value[class == TRUE])/n(),
value = value[class == TRUE])
# 生成一个 tibble 表格:
# group pct value
# <chr> <dbl> <dbl>
# 1 A 0.4 1
# 2 B 0.6 2
# 3 C 0.8 5
英文:
You could do:
library(dplyr)
df %>%
group_by(group) %>%
summarize(pct = sum(value == value[class == TRUE])/n(),
value = value[class == TRUE])
# A tibble: 3 x 3
group pct value
<chr> <dbl> <dbl>
1 A 0.4 1
2 B 0.6 2
3 C 0.8 5
答案2
得分: 3
尝试
library(dplyr) # 版本 >= 1.10
df %>%
reframe(pct = mean(value == value[class]), value = value[class], .by = group)
- 输出
group pct value
1 A 0.4 1
2 B 0.6 2
3 C 0.8 5
或者使用 data.table
选项
library(data.table)
setDT(df)[df[(class)], .(value = i.value, pct = mean(value == i.value)), on = .(group), by = .EACHI]
group value pct
1: A 1 0.4
2: B 2 0.6
3: C 5 0.8
英文:
Try
library(dplyr)#version >= 1.10
df %>%
reframe(pct = mean(value == value[class]), value = value[class], .by = group)
-output
group pct value
1 A 0.4 1
2 B 0.6 2
3 C 0.8 5
Or with a data.table
option
library(data.table)
setDT(df)[df[(class)], .(value = i.value,
pct = mean(value == i.value)), on = .(group), by = .EACHI]
group value pct
1: A 1 0.4
2: B 2 0.6
3: C 5 0.8
</details>
# 答案3
**得分**: 3
使用`ave`和`subset`的基本R选项:
```R
subset(
transform(
df,
pct = ave(ave(class, group, value) > 0, group)
),
class
)
得到结果:
group class value pct
2 A TRUE 1 0.4
8 B TRUE 2 0.6
11 C TRUE 5 0.8
请注意,我只翻译了代码部分,没有包括附加的信息或回答。
英文:
A base R option with ave
+ subset
subset(
transform(
df,
pct = ave(ave(class, group, value) > 0, group)
),
class
)
gives
group class value pct
2 A TRUE 1 0.4
8 B TRUE 2 0.6
11 C TRUE 5 0.8
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论