选择在R中的一组组中具有最大值的行如何?

huangapple go评论87阅读模式
英文:

How to select row with max value from a group of groups in R?

问题

我有一个包含多个分组的数据框。我想创建一个由给定组中的最大行填充的数据框。给定数据框如下:

  1. group unit treatment value etc
  2. 1 A w 8 apple
  3. 1 A x 9 pear
  4. 1 A y 7 orange
  5. 1 A z 2 pear
  6. 1 B w 4 strawberry
  7. 1 B x 3 dragonfruit
  8. 1 B y 6 raspberry
  9. 1 B z 5 apple
  10. 1 C w 32 banana
  11. 1 C x 27 peach
  12. 1 C y 15 plum
  13. 1 C z 28 orange
  14. 2 A w 12 apricot
  15. 2 A x 11 blackberry
  16. 2 A y 10 banana
  17. 2 A z 9 raspeberry
  18. 2 B w 1 plum
  19. 2 B x 2 lemon
  20. 2 B y 3 grapefruit
  21. 2 B z 4 apple
  22. 2 C w 51 fig
  23. 2 C x 47 avocado
  24. 2 C y 68 blackberry
  25. 2 C z 53 dragonfruit

对于每个组,对于每个单元,我想选择具有最高值的行,以便最终得到:

  1. group unit treatment value etc
  2. 1 A x 9 pear
  3. 1 B y 6 raspberry
  4. 1 C w 32 banana
  5. 2 A w 12 apricot
  6. 2 B z 4 apple
  7. 2 C y 68 blackberry

etc列只是为了强调我想选择整行。

我可以编写一系列嵌套循环,但感觉应该有更优雅的方法。欢迎使用basetidyverse的建议。

英文:

I have a dataframe with a number of groupings. I want to create a dataframe populated by rows that are the maximum of a given group of groups. Given the dataframe

  1. group unit treatment value etc
  2. 1 A w 8 apple
  3. 1 A x 9 pear
  4. 1 A y 7 orange
  5. 1 A z 2 pear
  6. 1 B w 4 strawberry
  7. 1 B x 3 dragonfruit
  8. 1 B y 6 raspberry
  9. 1 B z 5 apple
  10. 1 C w 32 banana
  11. 1 C x 27 peach
  12. 1 C y 15 plum
  13. 1 C z 28 orange
  14. 2 A w 12 apricot
  15. 2 A x 11 blackberry
  16. 2 A y 10 banana
  17. 2 A z 9 raspeberry
  18. 2 B w 1 plum
  19. 2 B x 2 lemon
  20. 2 B y 3 grapefruit
  21. 2 B z 4 apple
  22. 2 C w 51 fig
  23. 2 C x 47 avocado
  24. 2 C y 68 blackberry
  25. 2 C z 53 dragonfruit

for each group, for each unit, I would like to select the row with the highest value, such that I would end up with:

  1. group unit treatment value etc
  2. 1 A x 9 pear
  3. 1 B y 6 raspberry
  4. 1 C w 32 banana
  5. 2 A w 12 apricot
  6. 2 B z 4 apple
  7. 2 C y 68 blackberry

the etc column is just to highlight that I'd like to select the whole row.

I could write a series of nested loops, but there feels like there has to be something more elegant. Happy for base or tidyverse suggestions.

答案1

得分: 0

以下是翻译好的部分:

  1. 你可以按照以下方式操作:
  2. ```R
  3. library(dplyr)
  4. filter(dt, value==max(value), .by=group:unit)

或者(如@Limey建议的)

  1. library(dplyr)
  2. slice_max(dt, order_by= value, by=group:unit)

或者

  1. library(data.table)
  2. setDT(dt)[, .SD[value==max(value)], .(group, unit)]

输出结果:

  1. group unit treatment value etc
  2. <int> <char> <char> <int> <char>
  3. 1: 1 A x 9 pear
  4. 2: 1 B y 6 raspberry
  5. 3: 1 C w 32 banana
  6. 4: 2 A w 12 apricot
  7. 5: 2 B z 4 apple
  8. 6: 2 C y 68 blackberry
  1. <details>
  2. <summary>英文:</summary>
  3. You can do as follows:

library(dplyr)
filter(dt, value==max(value), .by=group:unit)

  1. or (as @Limey suggests)

library(dplyr)
slice_max(dt, order_by= value, by=group:unit)

  1. or

library(data.table)
setDT(dt)[, .SD[value==max(value)], .(group, unit)]

  1. Output:

group unit treatment value etc
<int> <char> <char> <int> <char>
1: 1 A x 9 pear
2: 1 B y 6 raspberry
3: 1 C w 32 banana
4: 2 A w 12 apricot
5: 2 B z 4 apple
6: 2 C y 68 blackberry

  1. </details>
  2. # 答案2
  3. **得分**: 0
  4. 你可以使用 `dplyr::slice_max()`。在 `dplyr V1.1.0` 或更高版本中:
  5. ```R
  6. library(dplyr)
  7. df %>%
  8. slice_max(value, by = c(group, unit))

在旧版本中:

  1. df %>%
  2. group_by(group, unit) %>%
  3. slice_max(value) %>%
  4. ungroup()
英文:

You can use dplyr::slice_max(). In dplyr V1.1.0 or later:

  1. library(dplyr)
  2. df %&gt;%
  3. slice_max(value, by = c(group, unit))

In older versions:

  1. df %&gt;%
  2. group_by(group, unit) %&gt;%
  3. slice_max(value) %&gt;%
  4. ungroup()

huangapple
  • 本文由 发表于 2023年5月20日 22:18:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76295680.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定