英文:
Efficient code to remove rows containing non-unique max?
问题
这是一个关于提取具有唯一最大值的行的简单数组的示例。以下是有效的代码:
winners <- max.col(foo)
unique_max <- apply(foo, 1, function(row) length(which(row == max(row))) == 1)
foo <- foo[unique_max, ]
这段代码将提取具有唯一最大值的行并将它们存储在foo
中。这段代码更有效率并只使用了基本的R调用。
英文:
Here's a simple example of an array for which I want to extract only those rows whose max value is unique (in that row).
foo <- expand.grid(1:3,1:3,1:3)
Var1 Var2 Var3
1 1 1 1
2 2 1 1
3 3 1 1
4 1 2 1
5 2 2 1
6 3 2 1
7 1 3 1
8 2 3 1
9 3 3 1
10 1 1 2
11 2 1 2
12 3 1 2
13 1 2 2
14 2 2 2
15 3 2 2
16 1 3 2
17 2 3 2
18 3 3 2
19 1 1 3
20 2 1 3
21 3 1 3
22 1 2 3
23 2 2 3
24 3 2 3
25 1 3 3
26 2 3 3
27 3 3 3
I've got working code:
winners <- max.col(foo)
finddupe <- rep(0,length=length(winners))
for (jf in 1:length(winners)) finddupe[jf] <- sum(foo[jf,] == foo[jf, winners[jf] ] )
winners <- winners[finddupe == 1]
foo <- foo[finddupe == 1, ]
That just looks inefficient to me.
I'd prefer a solution which only uses base - R
calls, but am open to using tools in other libraries.
答案1
得分: 4
以下是代码的翻译部分:
Another base R solution:
subset(foo, max.col(foo, 'first') == max.col(foo, 'last'))
Same logic as above in dplyr
way:
library(dplyr)
foo %>%
filter(max.col(., 'first') == max.col(., 'last'))
英文:
Another base R solution:
subset(foo, max.col(foo, 'first') == max.col(foo, 'last'))
Var1 Var2 Var3
2 2 1 1
3 3 1 1
4 1 2 1
6 3 2 1
7 1 3 1
8 2 3 1
10 1 1 2
12 3 1 2
15 3 2 2
16 1 3 2
17 2 3 2
19 1 1 3
20 2 1 3
22 1 2 3
23 2 2 3
>
Same logic as above in dplyr
way:
library(dplyr)
foo %>%
filter(max.col(., 'first') == max.col(., 'last'))
答案2
得分: 3
使用pmax
从所有列创建一个最大值的列,然后使用rowSums
在逻辑数据集上筛选只包含单个唯一最大值的行。
library(dplyr)
foo %>%
mutate(mx = do.call(pmax, c(across(everything()), na.rm = TRUE))) %>%
filter(rowSums(across(Var1:Var3, ~ .x == mx), na.rm = TRUE) == 1)
输出:
Var1 Var2 Var3 mx
1 2 1 1 2
2 3 1 1 3
3 1 2 1 2
4 3 2 1 3
5 1 3 1 3
6 2 3 1 3
7 1 1 2 2
8 3 1 2 3
9 3 2 2 3
10 1 3 2 3
11 2 3 2 3
12 1 1 3 3
13 2 1 3 3
14 1 2 3 3
15 2 2 3 3
或者使用base R
:
subset(foo, rowSums(foo == do.call(pmax, c(foo, na.rm = TRUE)), na.rm = TRUE) == 1)
英文:
Create a column of max with pmax
from all the columns, then filter
the rows where there is only a single unique max by getting the count on a logical dataset with rowSums
library(dplyr)
foo %>%
mutate(mx = do.call(pmax, c(across(everything()), na.rm = TRUE))) %>%
filter(rowSums(across(Var1:Var3, ~ .x == mx), na.rm = TRUE) == 1)
-output
Var1 Var2 Var3 mx
1 2 1 1 2
2 3 1 1 3
3 1 2 1 2
4 3 2 1 3
5 1 3 1 3
6 2 3 1 3
7 1 1 2 2
8 3 1 2 3
9 3 2 2 3
10 1 3 2 3
11 2 3 2 3
12 1 1 3 3
13 2 1 3 3
14 1 2 3 3
15 2 2 3 3
Or with base R
subset(foo, rowSums(foo == do.call(pmax, c(foo, na.rm = TRUE)),
na.rm = TRUE) == 1)
答案3
得分: 0
A base R approach using apply
:
foo[apply(foo, 1, function(x) sum(x[which.max(x)] == x) <= 1), ]
Var1 Var2 Var3
2 2 1 1
3 3 1 1
4 1 2 1
6 3 2 1
7 1 3 1
8 2 3 1
10 1 1 2
12 3 1 2
15 3 2 2
16 1 3 2
17 2 3 2
19 1 1 3
20 2 1 3
22 1 2 3
23 2 2 3
Please note that the code section remains unchanged.
英文:
A base R approach using apply
foo[apply(foo, 1, function(x) sum(x[which.max(x)] == x) <= 1), ]
Var1 Var2 Var3
2 2 1 1
3 3 1 1
4 1 2 1
6 3 2 1
7 1 3 1
8 2 3 1
10 1 1 2
12 3 1 2
15 3 2 2
16 1 3 2
17 2 3 2
19 1 1 3
20 2 1 3
22 1 2 3
23 2 2 3
答案4
得分: 0
@onyambu 在比赛中获胜。 (cgw 是我; ak** 是 akrun 的解决方案)
bar5 = 1:5
foo55 <- expand.grid(bar5,bar5,bar5,bar5,bar5)
microbenchmark(ony(foo55), cgw(foo55), akply(foo55), akbase(foo55), andre(foo55))
Unit: microseconds
expr min lq mean median uq max neval cld
ony(foo55) 455.117 495.2335 589.6801 517.3755 634.9795 3107.222 100 a
cgw(foo55) 314076.038 317184.4050 348711.9522 319784.5870 324921.0335 2691161.873 100 b
akply(foo55) 14156.653 14835.2230 16194.3699 15160.0270 16441.3550 74019.622 100 a
akbase(foo55) 858.969 896.8310 1055.4277 970.6395 1117.2420 4098.860 100 a
andre(foo55) 8161.406 8531.1700 9188.4801 8872.0325 9284.0995 14548.383 100 a
英文:
After verifying the answers so far (18:00 EST Weds 15 Feb), I ran a benchmark comparison. @onyambu wins the race. (cgw is me; ak** are akrun's solutions)
bar5 = 1:5
foo55 <- expand.grid(bar5,bar5,bar5,bar5,bar5)
microbenchmark(ony(foo55), cgw(foo55), akply(foo55), akbase(foo55), andre(foo55))
Unit: microseconds
expr min lq mean median uq max neval cld
ony(foo55) 455.117 495.2335 589.6801 517.3755 634.9795 3107.222 100 a
cgw(foo55) 314076.038 317184.4050 348711.9522 319784.5870 324921.0335 2691161.873 100 b
akply(foo55) 14156.653 14835.2230 16194.3699 15160.0270 16441.3550 74019.622 100 a
akbase(foo55) 858.969 896.8310 1055.4277 970.6395 1117.2420 4098.860 100 a
andre(foo55) 8161.406 8531.1700 9188.4801 8872.0325 9284.0995 14548.383 100 a
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论