有效的代码来删除包含非唯一最大值的行?

huangapple go评论69阅读模式
英文:

Efficient code to remove rows containing non-unique max?

问题

这是一个关于提取具有唯一最大值的行的简单数组的示例。以下是有效的代码:

winners <- max.col(foo)
unique_max <- apply(foo, 1, function(row) length(which(row == max(row))) == 1)
foo <- foo[unique_max, ]

这段代码将提取具有唯一最大值的行并将它们存储在foo中。这段代码更有效率并只使用了基本的R调用。

英文:

Here's a simple example of an array for which I want to extract only those rows whose max value is unique (in that row).

 foo &lt;-  expand.grid(1:3,1:3,1:3)
   Var1 Var2 Var3
1     1    1    1
2     2    1    1
3     3    1    1
4     1    2    1
5     2    2    1
6     3    2    1
7     1    3    1
8     2    3    1
9     3    3    1
10    1    1    2
11    2    1    2
12    3    1    2
13    1    2    2
14    2    2    2
15    3    2    2
16    1    3    2
17    2    3    2
18    3    3    2
19    1    1    3
20    2    1    3
21    3    1    3
22    1    2    3
23    2    2    3
24    3    2    3
25    1    3    3
26    2    3    3
27    3    3    3

I've got working code:

winners &lt;- max.col(foo) 
finddupe &lt;- rep(0,length=length(winners))
for (jf in 1:length(winners)) finddupe[jf] &lt;- sum(foo[jf,] == foo[jf, winners[jf] ] )
winners &lt;- winners[finddupe == 1]
foo &lt;- foo[finddupe == 1, ]

That just looks inefficient to me.
I'd prefer a solution which only uses base - R calls, but am open to using tools in other libraries.

答案1

得分: 4

以下是代码的翻译部分:

Another base R solution:

subset(foo, max.col(foo, 'first') == max.col(foo, 'last'))

Same logic as above in dplyr way:

library(dplyr)

foo %>%
  filter(max.col(., 'first') == max.col(., 'last'))
英文:

Another base R solution:

subset(foo, max.col(foo, &#39;first&#39;) == max.col(foo, &#39;last&#39;))

   Var1 Var2 Var3
2     2    1    1
3     3    1    1
4     1    2    1
6     3    2    1
7     1    3    1
8     2    3    1
10    1    1    2
12    3    1    2
15    3    2    2
16    1    3    2
17    2    3    2
19    1    1    3
20    2    1    3
22    1    2    3
23    2    2    3
&gt; 

Same logic as above in dplyr way:

library(dplyr) 

foo %&gt;%   
  filter(max.col(., &#39;first&#39;) == max.col(., &#39;last&#39;))

答案2

得分: 3

使用pmax从所有列创建一个最大值的列,然后使用rowSums在逻辑数据集上筛选只包含单个唯一最大值的行。

library(dplyr)
foo %>%
   mutate(mx = do.call(pmax, c(across(everything()), na.rm = TRUE))) %>%
   filter(rowSums(across(Var1:Var3, ~ .x == mx), na.rm = TRUE) == 1)

输出:

   Var1 Var2 Var3 mx
1     2    1    1  2
2     3    1    1  3
3     1    2    1  2
4     3    2    1  3
5     1    3    1  3
6     2    3    1  3
7     1    1    2  2
8     3    1    2  3
9     3    2    2  3
10    1    3    2  3
11    2    3    2  3
12    1    1    3  3
13    2    1    3  3
14    1    2    3  3
15    2    2    3  3

或者使用base R

subset(foo, rowSums(foo == do.call(pmax, c(foo, na.rm = TRUE)), na.rm = TRUE) == 1)
英文:

Create a column of max with pmax from all the columns, then filter the rows where there is only a single unique max by getting the count on a logical dataset with rowSums

library(dplyr)
foo %&gt;%
   mutate(mx = do.call(pmax, c(across(everything()), na.rm = TRUE))) %&gt;% 
   filter(rowSums(across(Var1:Var3, ~ .x  == mx), na.rm = TRUE) == 1)

-output

   Var1 Var2 Var3 mx
1     2    1    1  2
2     3    1    1  3
3     1    2    1  2
4     3    2    1  3
5     1    3    1  3
6     2    3    1  3
7     1    1    2  2
8     3    1    2  3
9     3    2    2  3
10    1    3    2  3
11    2    3    2  3
12    1    1    3  3
13    2    1    3  3
14    1    2    3  3
15    2    2    3  3

Or with base R

subset(foo, rowSums(foo == do.call(pmax, c(foo, na.rm = TRUE)), 
      na.rm = TRUE) == 1)

答案3

得分: 0

A base R approach using apply:

foo[apply(foo, 1, function(x) sum(x[which.max(x)] == x) <= 1), ]
   Var1 Var2 Var3
2     2    1    1
3     3    1    1
4     1    2    1
6     3    2    1
7     1    3    1
8     2    3    1
10    1    1    2
12    3    1    2
15    3    2    2
16    1    3    2
17    2    3    2
19    1    1    3
20    2    1    3
22    1    2    3
23    2    2    3

Please note that the code section remains unchanged.

英文:

A base R approach using apply

foo[apply(foo, 1, function(x) sum(x[which.max(x)] == x) &lt;= 1), ]
   Var1 Var2 Var3
2     2    1    1
3     3    1    1
4     1    2    1
6     3    2    1
7     1    3    1
8     2    3    1
10    1    1    2
12    3    1    2
15    3    2    2
16    1    3    2
17    2    3    2
19    1    1    3
20    2    1    3
22    1    2    3
23    2    2    3

答案4

得分: 0

@onyambu 在比赛中获胜。 (cgw 是我; ak** 是 akrun 的解决方案)

bar5 = 1:5
foo55 <- expand.grid(bar5,bar5,bar5,bar5,bar5)
microbenchmark(ony(foo55), cgw(foo55), akply(foo55), akbase(foo55), andre(foo55))
Unit: microseconds
          expr        min          lq        mean      median          uq         max neval cld
    ony(foo55)    455.117    495.2335    589.6801    517.3755    634.9795    3107.222   100  a 
    cgw(foo55) 314076.038 317184.4050 348711.9522 319784.5870 324921.0335 2691161.873   100   b
  akply(foo55)  14156.653  14835.2230  16194.3699  15160.0270  16441.3550   74019.622   100  a 
 akbase(foo55)    858.969    896.8310   1055.4277    970.6395   1117.2420    4098.860   100  a 
  andre(foo55)   8161.406   8531.1700   9188.4801   8872.0325   9284.0995   14548.383   100  a
英文:

After verifying the answers so far (18:00 EST Weds 15 Feb), I ran a benchmark comparison. @onyambu wins the race. (cgw is me; ak** are akrun's solutions)

bar5 = 1:5
 foo55 &lt;- expand.grid(bar5,bar5,bar5,bar5,bar5)
microbenchmark(ony(foo55), cgw(foo55), akply(foo55), akbase(foo55), andre(foo55))
Unit: microseconds
          expr        min          lq        mean      median          uq         max neval cld
    ony(foo55)    455.117    495.2335    589.6801    517.3755    634.9795    3107.222   100  a 
    cgw(foo55) 314076.038 317184.4050 348711.9522 319784.5870 324921.0335 2691161.873   100   b
  akply(foo55)  14156.653  14835.2230  16194.3699  15160.0270  16441.3550   74019.622   100  a 
 akbase(foo55)    858.969    896.8310   1055.4277    970.6395   1117.2420    4098.860   100  a 
  andre(foo55)   8161.406   8531.1700   9188.4801   8872.0325   9284.0995   14548.383   100  a

huangapple
  • 本文由 发表于 2023年2月16日 02:03:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/75463792.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定