英文:
Calculate deciles based on subgroup and apply to entire dataset
问题
我有一个具有以下列的数据集:
subgroup: [group1, group2]
distribution: 连续变量
我想基于数据集的一个子组来计算十分位数:
df <- df %>%
filter(subgroup == "group1") %>%
mutate(decile = ntile(distribution, 10))
然后我想使用所得到的十分位数应用于整个数据集(不仅仅是group1)。
有没有办法可以做到这一点?
这是一个示例数据集:
df <- matrix(0, ncol=3, nrow=10000)
df[,1] <- 1:10000
df[,2] <- sample(c("group1","group2"), 10000, replace=TRUE)
df[,3] <- rnorm(10000)
df <- as.data.frame(df)
colnames(df) <- c("id", "subgroup", "value")
我选择子组 group1
并基于列 value
计算十分位数:
df %>% filter(subgroup == 'group1') %>%
mutate(decile = ntile(value, 10))
然后我想使用从 group1
获取的十分位数,并根据这些十分位数对 subgroup=='group2'
进行分类。
期望的输出是 df
中的第四列,其中每个观察都有一个介于1和10之间的单个值(即每个观察的十分位分类)。
英文:
I have a dataset with columns:
subgroup: [group1, group2]
distribution: continuous variable
I want to calculate deciles based on a subgroup of the dataset:
df <- df %>%
filter(subgroup == "group1") %>%
mutate(decile = ntile(distribution, 10))
then I would like to use the obtained deciles and apply it to the entire dataset (i.e. not just group1).
is there a way to do this?
here's an example dataset
df <- matrix(0,ncol=3,nrow=10000)
df[,1] <- 1:10000
df[,2] <- sample(c("group1","group2"),10000,replace=T)
df[,3] <- rnorm(10000)
df <- as.data.frame(df)
colnames(df) <- c("id", "subgroup","value")
I select the subgroup group1
and calculate deciles based on the column value
df %>% filter(subgroup == 'group1') %>%
mutate(decile = ntile(value, 10))
then I would like to use the obtained deciles, and classify subgroup=='group2'
based on the deciles obtained from 'group1'
the desired output would be a 4th column in df
with a single value between 1 and 10 for each observation. (i.e. the decile classification for each observation)
答案1
得分: 0
以下是代码中需要翻译的部分:
We could use cut
to divide the values into decile groups based on "group1".
library(dplyr)
df |>
mutate(decile = cut(value,
quantile(value[subgroup == "group1"], seq(0, 1, 0.1)),
labels = FALSE)
)
Output:
id subgroup value decile
1 1 group1 0.674098613 8
2 2 group1 -2.881811886 1
3 3 group1 -0.377427063 4
4 4 group1 0.461585185 7
5 5 group1 0.460216469 7
6 6 group1 -1.374041767 1
7 7 group1 -0.945986918 2
8 8 group2 0.472525168 7
9 9 group2 0.418391193 7
10 10 group2 0.746413150 8
11 11 group2 0.175323464 6
12 12 group1 0.879160602 9
13 13 group1 0.469811384 7
14 14 group2 0.639019379 8
15 15 group1 -0.328276877 4
16 16 group1 -0.099512041 5
17 17 group1 -0.714642875 3
18 18 group1 -0.404702209 4
19 19 group1 -2.181077079 1
20 20 group2 -2.298182006 1
Data:
df$value <- as.numeric(df$value)
请注意,我已经将HTML编码中的"
更改为正常的引号以便更好地理解代码和输出。
英文:
We could use cut
to divide the values into decile groups based on "group1".
library(dplyr)
df |>
mutate(decile = cut(value,
quantile(value[subgroup == "group1"], seq(0, 1, 0.1)),
labels = FALSE)
)
Output:
id subgroup value decile
1 1 group1 0.674098613 8
2 2 group1 -2.881811886 1
3 3 group1 -0.377427063 4
4 4 group1 0.461585185 7
5 5 group1 0.460216469 7
6 6 group1 -1.374041767 1
7 7 group1 -0.945986918 2
8 8 group2 0.472525168 7
9 9 group2 0.418391193 7
10 10 group2 0.746413150 8
11 11 group2 0.175323464 6
12 12 group1 0.879160602 9
13 13 group1 0.469811384 7
14 14 group2 0.639019379 8
15 15 group1 -0.328276877 4
16 16 group1 -0.099512041 5
17 17 group1 -0.714642875 3
18 18 group1 -0.404702209 4
19 19 group1 -2.181077079 1
20 20 group2 -2.298182006 1
Data:
df$value <- as.numeric(df$value)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论