计算基于子组的十分位数,并应用于整个数据集。

huangapple go评论114阅读模式
英文:

Calculate deciles based on subgroup and apply to entire dataset

问题

我有一个具有以下列的数据集:

  1. subgroup: [group1, group2]
  2. distribution: 连续变量

我想基于数据集的一个子组来计算十分位数:

  1. df <- df %>%
  2. filter(subgroup == "group1") %>%
  3. mutate(decile = ntile(distribution, 10))

然后我想使用所得到的十分位数应用于整个数据集(不仅仅是group1)。

有没有办法可以做到这一点?

这是一个示例数据集:

  1. df <- matrix(0, ncol=3, nrow=10000)
  2. df[,1] <- 1:10000
  3. df[,2] <- sample(c("group1","group2"), 10000, replace=TRUE)
  4. df[,3] <- rnorm(10000)
  5. df <- as.data.frame(df)
  6. colnames(df) <- c("id", "subgroup", "value")

我选择子组 group1 并基于列 value 计算十分位数:

  1. df %>% filter(subgroup == 'group1') %>%
  2. mutate(decile = ntile(value, 10))

然后我想使用从 group1 获取的十分位数,并根据这些十分位数对 subgroup=='group2' 进行分类。

期望的输出是 df 中的第四列,其中每个观察都有一个介于1和10之间的单个值(即每个观察的十分位分类)。

英文:

I have a dataset with columns:

  1. subgroup: [group1, group2]
  2. distribution: continuous variable

I want to calculate deciles based on a subgroup of the dataset:

  1. df &lt;- df %&gt;%
  2. filter(subgroup == &quot;group1&quot;) %&gt;%
  3. mutate(decile = ntile(distribution, 10))

then I would like to use the obtained deciles and apply it to the entire dataset (i.e. not just group1).

is there a way to do this?

here's an example dataset

  1. df &lt;- matrix(0,ncol=3,nrow=10000)
  2. df[,1] &lt;- 1:10000
  3. df[,2] &lt;- sample(c(&quot;group1&quot;,&quot;group2&quot;),10000,replace=T)
  4. df[,3] &lt;- rnorm(10000)
  5. df &lt;- as.data.frame(df)
  6. colnames(df) &lt;- c(&quot;id&quot;, &quot;subgroup&quot;,&quot;value&quot;)

I select the subgroup group1 and calculate deciles based on the column value

  1. df %&gt;% filter(subgroup == &#39;group1&#39;) %&gt;%
  2. mutate(decile = ntile(value, 10))

then I would like to use the obtained deciles, and classify subgroup==&#39;group2&#39; based on the deciles obtained from &#39;group1&#39;

the desired output would be a 4th column in df with a single value between 1 and 10 for each observation. (i.e. the decile classification for each observation)

答案1

得分: 0

以下是代码中需要翻译的部分:

We could use cut to divide the values into decile groups based on "group1".

  1. library(dplyr)
  2. df |&gt;
  3. mutate(decile = cut(value,
  4. quantile(value[subgroup == "group1"], seq(0, 1, 0.1)),
  5. labels = FALSE)
  6. )

Output:

  1. id subgroup value decile
  2. 1 1 group1 0.674098613 8
  3. 2 2 group1 -2.881811886 1
  4. 3 3 group1 -0.377427063 4
  5. 4 4 group1 0.461585185 7
  6. 5 5 group1 0.460216469 7
  7. 6 6 group1 -1.374041767 1
  8. 7 7 group1 -0.945986918 2
  9. 8 8 group2 0.472525168 7
  10. 9 9 group2 0.418391193 7
  11. 10 10 group2 0.746413150 8
  12. 11 11 group2 0.175323464 6
  13. 12 12 group1 0.879160602 9
  14. 13 13 group1 0.469811384 7
  15. 14 14 group2 0.639019379 8
  16. 15 15 group1 -0.328276877 4
  17. 16 16 group1 -0.099512041 5
  18. 17 17 group1 -0.714642875 3
  19. 18 18 group1 -0.404702209 4
  20. 19 19 group1 -2.181077079 1
  21. 20 20 group2 -2.298182006 1

Data:

  1. df$value &lt;- as.numeric(df$value)

请注意,我已经将HTML编码中的&quot;更改为正常的引号以便更好地理解代码和输出。

英文:

We could use cut to divide the values into decile groups based on "group1".

  1. library(dplyr)
  2. df |&gt;
  3. mutate(decile = cut(value,
  4. quantile(value[subgroup == &quot;group1&quot;], seq(0, 1, 0.1)),
  5. labels = FALSE)
  6. )

Output:

  1. id subgroup value decile
  2. 1 1 group1 0.674098613 8
  3. 2 2 group1 -2.881811886 1
  4. 3 3 group1 -0.377427063 4
  5. 4 4 group1 0.461585185 7
  6. 5 5 group1 0.460216469 7
  7. 6 6 group1 -1.374041767 1
  8. 7 7 group1 -0.945986918 2
  9. 8 8 group2 0.472525168 7
  10. 9 9 group2 0.418391193 7
  11. 10 10 group2 0.746413150 8
  12. 11 11 group2 0.175323464 6
  13. 12 12 group1 0.879160602 9
  14. 13 13 group1 0.469811384 7
  15. 14 14 group2 0.639019379 8
  16. 15 15 group1 -0.328276877 4
  17. 16 16 group1 -0.099512041 5
  18. 17 17 group1 -0.714642875 3
  19. 18 18 group1 -0.404702209 4
  20. 19 19 group1 -2.181077079 1
  21. 20 20 group2 -2.298182006 1

Data:

  1. df$value &lt;- as.numeric(df$value)

huangapple
  • 本文由 发表于 2023年5月29日 18:39:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76356628.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定