2023年2月16日 02:03:37go评论102阅读模式

英文:

Efficient code to remove rows containing non-unique max?

问题

这是一个关于提取具有唯一最大值的行的简单数组的示例。以下是有效的代码：

winners <- max.col(foo)
unique_max <- apply(foo, 1, function(row) length(which(row == max(row))) == 1)
foo <- foo[unique_max, ]

这段代码将提取具有唯一最大值的行并将它们存储在foo中。这段代码更有效率并只使用了基本的R调用。

英文:

Here's a simple example of an array for which I want to extract only those rows whose max value is unique (in that row).

 foo &lt;-  expand.grid(1:3,1:3,1:3)
   Var1 Var2 Var3
1     1    1    1
2     2    1    1
3     3    1    1
4     1    2    1
5     2    2    1
6     3    2    1
7     1    3    1
8     2    3    1
9     3    3    1
10    1    1    2
11    2    1    2
12    3    1    2
13    1    2    2
14    2    2    2
15    3    2    2
16    1    3    2
17    2    3    2
18    3    3    2
19    1    1    3
20    2    1    3
21    3    1    3
22    1    2    3
23    2    2    3
24    3    2    3
25    1    3    3
26    2    3    3
27    3    3    3

I've got working code:

winners &lt;- max.col(foo) 
finddupe &lt;- rep(0,length=length(winners))
for (jf in 1:length(winners)) finddupe[jf] &lt;- sum(foo[jf,] == foo[jf, winners[jf] ] )
winners &lt;- winners[finddupe == 1]
foo &lt;- foo[finddupe == 1, ]

That just looks inefficient to me.
I'd prefer a solution which only uses base - R calls, but am open to using tools in other libraries.

答案1

得分: 4

以下是代码的翻译部分：

Another base R solution:

subset(foo, max.col(foo, 'first') == max.col(foo, 'last'))

Same logic as above in dplyr way:

library(dplyr)
foo %>%
  filter(max.col(., 'first') == max.col(., 'last'))

英文:

Another base R solution:

subset(foo, max.col(foo, &#39;first&#39;) == max.col(foo, &#39;last&#39;))
   Var1 Var2 Var3
2     2    1    1
3     3    1    1
4     1    2    1
6     3    2    1
7     1    3    1
8     2    3    1
10    1    1    2
12    3    1    2
15    3    2    2
16    1    3    2
17    2    3    2
19    1    1    3
20    2    1    3
22    1    2    3
23    2    2    3
&gt;

Same logic as above in dplyr way:

library(dplyr) 
foo %&gt;%   
  filter(max.col(., &#39;first&#39;) == max.col(., &#39;last&#39;))

答案2

得分: 3

使用pmax从所有列创建一个最大值的列，然后使用rowSums在逻辑数据集上筛选只包含单个唯一最大值的行。

library(dplyr)
foo %>%
   mutate(mx = do.call(pmax, c(across(everything()), na.rm = TRUE))) %>%
   filter(rowSums(across(Var1:Var3, ~ .x == mx), na.rm = TRUE) == 1)

输出：

   Var1 Var2 Var3 mx
1     2    1    1  2
2     3    1    1  3
3     1    2    1  2
4     3    2    1  3
5     1    3    1  3
6     2    3    1  3
7     1    1    2  2
8     3    1    2  3
9     3    2    2  3
10    1    3    2  3
11    2    3    2  3
12    1    1    3  3
13    2    1    3  3
14    1    2    3  3
15    2    2    3  3

或者使用base R：

subset(foo, rowSums(foo == do.call(pmax, c(foo, na.rm = TRUE)), na.rm = TRUE) == 1)

英文:

Create a column of max with pmax from all the columns, then filter the rows where there is only a single unique max by getting the count on a logical dataset with rowSums

library(dplyr)
foo %&gt;%
   mutate(mx = do.call(pmax, c(across(everything()), na.rm = TRUE))) %&gt;% 
   filter(rowSums(across(Var1:Var3, ~ .x  == mx), na.rm = TRUE) == 1)

-output

   Var1 Var2 Var3 mx
1     2    1    1  2
2     3    1    1  3
3     1    2    1  2
4     3    2    1  3
5     1    3    1  3
6     2    3    1  3
7     1    1    2  2
8     3    1    2  3
9     3    2    2  3
10    1    3    2  3
11    2    3    2  3
12    1    1    3  3
13    2    1    3  3
14    1    2    3  3
15    2    2    3  3

Or with base R

subset(foo, rowSums(foo == do.call(pmax, c(foo, na.rm = TRUE)), 
      na.rm = TRUE) == 1)

答案3

得分: 0

A base R approach using apply:

foo[apply(foo, 1, function(x) sum(x[which.max(x)] == x) <= 1), ]
   Var1 Var2 Var3
2     2    1    1
3     3    1    1
4     1    2    1
6     3    2    1
7     1    3    1
8     2    3    1
10    1    1    2
12    3    1    2
15    3    2    2
16    1    3    2
17    2    3    2
19    1    1    3
20    2    1    3
22    1    2    3
23    2    2    3

Please note that the code section remains unchanged.

英文:

A base R approach using apply

foo[apply(foo, 1, function(x) sum(x[which.max(x)] == x) &lt;= 1), ]
   Var1 Var2 Var3
2     2    1    1
3     3    1    1
4     1    2    1
6     3    2    1
7     1    3    1
8     2    3    1
10    1    1    2
12    3    1    2
15    3    2    2
16    1    3    2
17    2    3    2
19    1    1    3
20    2    1    3
22    1    2    3
23    2    2    3

答案4

得分: 0

@onyambu 在比赛中获胜。 (cgw 是我; ak** 是 akrun 的解决方案)

bar5 = 1:5
foo55 <- expand.grid(bar5,bar5,bar5,bar5,bar5)
microbenchmark(ony(foo55), cgw(foo55), akply(foo55), akbase(foo55), andre(foo55))
Unit: microseconds
          expr        min          lq        mean      median          uq         max neval cld
    ony(foo55)    455.117    495.2335    589.6801    517.3755    634.9795    3107.222   100  a 
    cgw(foo55) 314076.038 317184.4050 348711.9522 319784.5870 324921.0335 2691161.873   100   b
  akply(foo55)  14156.653  14835.2230  16194.3699  15160.0270  16441.3550   74019.622   100  a 
 akbase(foo55)    858.969    896.8310   1055.4277    970.6395   1117.2420    4098.860   100  a 
  andre(foo55)   8161.406   8531.1700   9188.4801   8872.0325   9284.0995   14548.383   100  a

英文:

After verifying the answers so far (18:00 EST Weds 15 Feb), I ran a benchmark comparison. @onyambu wins the race. (cgw is me; ak** are akrun's solutions)

bar5 = 1:5
 foo55 &lt;- expand.grid(bar5,bar5,bar5,bar5,bar5)
microbenchmark(ony(foo55), cgw(foo55), akply(foo55), akbase(foo55), andre(foo55))
Unit: microseconds
          expr        min          lq        mean      median          uq         max neval cld
    ony(foo55)    455.117    495.2335    589.6801    517.3755    634.9795    3107.222   100  a 
    cgw(foo55) 314076.038 317184.4050 348711.9522 319784.5870 324921.0335 2691161.873   100   b
  akply(foo55)  14156.653  14835.2230  16194.3699  15160.0270  16441.3550   74019.622   100  a 
 akbase(foo55)    858.969    896.8310   1055.4277    970.6395   1117.2420    4098.860   100  a 
  andre(foo55)   8161.406   8531.1700   9188.4801   8872.0325   9284.0995   14548.383   100  a

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

有效的代码来删除包含非唯一最大值的行？

问题

答案1

答案2

答案3

答案4

在R中向多个分组的小提琴图中添加中位数和四分位范围。

pandas：根据条件筛选整个分组。

将自定义包的所有函数导出到一个文件中。

“Function” 不是从 “Package” 导出的对象。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。