英文:
Calculate deciles based on subgroup and apply to entire dataset
问题
我有一个具有以下列的数据集:
subgroup: [group1, group2]
distribution: 连续变量
我想基于数据集的一个子组来计算十分位数:
df <- df %>%
  filter(subgroup == "group1") %>%
  mutate(decile = ntile(distribution, 10))
然后我想使用所得到的十分位数应用于整个数据集(不仅仅是group1)。
有没有办法可以做到这一点?
这是一个示例数据集:
df <- matrix(0, ncol=3, nrow=10000)
df[,1] <- 1:10000
df[,2] <- sample(c("group1","group2"), 10000, replace=TRUE)
df[,3] <- rnorm(10000)
df <- as.data.frame(df)
colnames(df) <- c("id", "subgroup", "value")
我选择子组 group1 并基于列 value 计算十分位数:
df %>% filter(subgroup == 'group1') %>%
 mutate(decile = ntile(value, 10))
然后我想使用从 group1 获取的十分位数,并根据这些十分位数对 subgroup=='group2' 进行分类。
期望的输出是 df 中的第四列,其中每个观察都有一个介于1和10之间的单个值(即每个观察的十分位分类)。
英文:
I have a dataset with columns:
subgroup: [group1, group2]
distribution: continuous variable
I want to calculate deciles based on a subgroup of the dataset:
df <- df %>%
  filter(subgroup == "group1") %>%
  mutate(decile = ntile(distribution, 10))
then I would like to use the obtained deciles and apply it to the entire dataset (i.e. not just group1).
is there a way to do this?
here's an example dataset
df <- matrix(0,ncol=3,nrow=10000)
df[,1] <- 1:10000
df[,2] <- sample(c("group1","group2"),10000,replace=T)
df[,3] <- rnorm(10000)
df <- as.data.frame(df)
colnames(df) <- c("id", "subgroup","value")
I select the subgroup group1 and calculate deciles based on the column value
df %>% filter(subgroup == 'group1') %>%
 mutate(decile = ntile(value, 10))
then I would like to use the obtained deciles, and classify subgroup=='group2' based on the deciles obtained from 'group1'
the desired output would be a 4th column in df with a single value between 1 and 10 for each observation. (i.e. the decile classification for each observation)
答案1
得分: 0
以下是代码中需要翻译的部分:
We could use cut to divide the values into decile groups based on "group1".
library(dplyr)
df |> 
  mutate(decile = cut(value, 
                      quantile(value[subgroup == "group1"], seq(0, 1, 0.1)),
                      labels = FALSE)
         )
Output:
     id subgroup        value decile
1     1   group1  0.674098613      8
2     2   group1 -2.881811886      1
3     3   group1 -0.377427063      4
4     4   group1  0.461585185      7
5     5   group1  0.460216469      7
6     6   group1 -1.374041767      1
7     7   group1 -0.945986918      2
8     8   group2  0.472525168      7
9     9   group2  0.418391193      7
10   10   group2  0.746413150      8
11   11   group2  0.175323464      6
12   12   group1  0.879160602      9
13   13   group1  0.469811384      7
14   14   group2  0.639019379      8
15   15   group1 -0.328276877      4
16   16   group1 -0.099512041      5
17   17   group1 -0.714642875      3
18   18   group1 -0.404702209      4
19   19   group1 -2.181077079      1
20   20   group2 -2.298182006      1
Data:
df$value <- as.numeric(df$value)
请注意,我已经将HTML编码中的"更改为正常的引号以便更好地理解代码和输出。
英文:
We could use cut to divide the values into decile groups based on "group1".
library(dplyr)
df |> 
  mutate(decile = cut(value, 
                      quantile(value[subgroup == "group1"], seq(0, 1, 0.1)),
                      labels = FALSE)
         )
Output:
     id subgroup        value decile
1     1   group1  0.674098613      8
2     2   group1 -2.881811886      1
3     3   group1 -0.377427063      4
4     4   group1  0.461585185      7
5     5   group1  0.460216469      7
6     6   group1 -1.374041767      1
7     7   group1 -0.945986918      2
8     8   group2  0.472525168      7
9     9   group2  0.418391193      7
10   10   group2  0.746413150      8
11   11   group2  0.175323464      6
12   12   group1  0.879160602      9
13   13   group1  0.469811384      7
14   14   group2  0.639019379      8
15   15   group1 -0.328276877      4
16   16   group1 -0.099512041      5
17   17   group1 -0.714642875      3
18   18   group1 -0.404702209      4
19   19   group1 -2.181077079      1
20   20   group2 -2.298182006      1
Data:
df$value <- as.numeric(df$value)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论