英文:
How to select row with max value from a group of groups in R?
问题
我有一个包含多个分组的数据框。我想创建一个由给定组中的最大行填充的数据框。给定数据框如下:
group unit treatment value etc
1 A w 8 apple
1 A x 9 pear
1 A y 7 orange
1 A z 2 pear
1 B w 4 strawberry
1 B x 3 dragonfruit
1 B y 6 raspberry
1 B z 5 apple
1 C w 32 banana
1 C x 27 peach
1 C y 15 plum
1 C z 28 orange
2 A w 12 apricot
2 A x 11 blackberry
2 A y 10 banana
2 A z 9 raspeberry
2 B w 1 plum
2 B x 2 lemon
2 B y 3 grapefruit
2 B z 4 apple
2 C w 51 fig
2 C x 47 avocado
2 C y 68 blackberry
2 C z 53 dragonfruit
对于每个组,对于每个单元,我想选择具有最高值的行,以便最终得到:
group unit treatment value etc
1 A x 9 pear
1 B y 6 raspberry
1 C w 32 banana
2 A w 12 apricot
2 B z 4 apple
2 C y 68 blackberry
etc列只是为了强调我想选择整行。
我可以编写一系列嵌套循环,但感觉应该有更优雅的方法。欢迎使用base
或tidyverse
的建议。
英文:
I have a dataframe with a number of groupings. I want to create a dataframe populated by rows that are the maximum of a given group of groups. Given the dataframe
group unit treatment value etc
1 A w 8 apple
1 A x 9 pear
1 A y 7 orange
1 A z 2 pear
1 B w 4 strawberry
1 B x 3 dragonfruit
1 B y 6 raspberry
1 B z 5 apple
1 C w 32 banana
1 C x 27 peach
1 C y 15 plum
1 C z 28 orange
2 A w 12 apricot
2 A x 11 blackberry
2 A y 10 banana
2 A z 9 raspeberry
2 B w 1 plum
2 B x 2 lemon
2 B y 3 grapefruit
2 B z 4 apple
2 C w 51 fig
2 C x 47 avocado
2 C y 68 blackberry
2 C z 53 dragonfruit
for each group, for each unit, I would like to select the row with the highest value, such that I would end up with:
group unit treatment value etc
1 A x 9 pear
1 B y 6 raspberry
1 C w 32 banana
2 A w 12 apricot
2 B z 4 apple
2 C y 68 blackberry
the etc column is just to highlight that I'd like to select the whole row.
I could write a series of nested loops, but there feels like there has to be something more elegant. Happy for base
or tidyverse
suggestions.
答案1
得分: 0
以下是翻译好的部分:
你可以按照以下方式操作:
```R
library(dplyr)
filter(dt, value==max(value), .by=group:unit)
或者(如@Limey建议的)
library(dplyr)
slice_max(dt, order_by= value, by=group:unit)
或者
library(data.table)
setDT(dt)[, .SD[value==max(value)], .(group, unit)]
输出结果:
group unit treatment value etc
<int> <char> <char> <int> <char>
1: 1 A x 9 pear
2: 1 B y 6 raspberry
3: 1 C w 32 banana
4: 2 A w 12 apricot
5: 2 B z 4 apple
6: 2 C y 68 blackberry
<details>
<summary>英文:</summary>
You can do as follows:
library(dplyr)
filter(dt, value==max(value), .by=group:unit)
or (as @Limey suggests)
library(dplyr)
slice_max(dt, order_by= value, by=group:unit)
or
library(data.table)
setDT(dt)[, .SD[value==max(value)], .(group, unit)]
Output:
group unit treatment value etc
<int> <char> <char> <int> <char>
1: 1 A x 9 pear
2: 1 B y 6 raspberry
3: 1 C w 32 banana
4: 2 A w 12 apricot
5: 2 B z 4 apple
6: 2 C y 68 blackberry
</details>
# 答案2
**得分**: 0
你可以使用 `dplyr::slice_max()`。在 `dplyr V1.1.0` 或更高版本中:
```R
library(dplyr)
df %>%
slice_max(value, by = c(group, unit))
在旧版本中:
df %>%
group_by(group, unit) %>%
slice_max(value) %>%
ungroup()
英文:
You can use dplyr::slice_max()
. In dplyr V1.1.0
or later:
library(dplyr)
df %>%
slice_max(value, by = c(group, unit))
In older versions:
df %>%
group_by(group, unit) %>%
slice_max(value) %>%
ungroup()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论