获取字典顺序中的前导元素。

huangapple go评论114阅读模式
英文:

Get leading element for the lexicographical order

问题

如果M是一个数值矩阵,我可以通过运行lexsort(M)来按照词典顺序对其行进行排序,其中

  1. lexorder <- function(M) {
  2. do.call(order, lapply(seq_len(ncol(M)), function(i) M[, i]))
  3. }
  4. lexsort <- function(M) {
  5. M[lexorder(M), ]
  6. }

但我只对获取较大的行感兴趣(排序后矩阵的最后一行)。我们是否可以避免对所有内容进行排序,以更高效地提取最后一行?

英文:

If M is a numeric matrix, I can order its rows with respect to the lexicographical order by running lexsort(M) where

  1. lexorder &lt;- function(M) {
  2. do.call(order, lapply(seq_len(ncol(M)), function(i) M[, i]))
  3. }
  4. lexsort &lt;- function(M) {
  5. M[lexorder(M), ]
  6. }

But I'm only interested in getting the greater row (the last one of the ordered matrix). Can we avoid ordering everything to more efficiently extract the last row?

答案1

得分: 3

以下是已翻译的内容:

  1. 您可以编写一个更快的递归函数:
  2. lex_max <- function(M,i=1){
  3. d <- dim(M)
  4. if(d[1] == 1 | i > d[2])M[1,]
  5. else lex_max(M[max(M[,i]) == M[,i],,drop = FALSE], i+1)
  6. }
  7. a <- matrix(sample(500, 1e5, TRUE), 10)
  8. 计时:
  9. 大量列:
  10. microbenchmark::microbenchmark(lex_max(a),
  11. OP=lexsort(a)[nrow(a),],lexmaxrow(a), check = 'equal')
  12. Unit: microseconds
  13. expr min lq mean median uq max neval
  14. lex_max(a) 45.7 57.95 80.442 65.55 87.45 306.4 100
  15. OP 15819.5 19437.25 25579.100 22246.55 29002.80 68948.8 100
  16. lexmaxrow(a) 16393.9 18739.65 25210.846 22022.40 29098.75 47731.1 100
  17. ---
  18. a <- matrix(sample(500, 1e5, TRUE), 500)
  19. Unit: microseconds
  20. expr min lq mean median uq max neval
  21. lex_max(a) 5.7 9.90 93.152 12.85 19.70 7124.0 100
  22. OP 577.0 629.75 907.524 699.20 1017.70 7771.4 100
  23. lexmaxrow(a) 470.5 521.05 875.526 619.45 908.85 10049.1 100
  24. ---
  25. 大量行
  26. a <- matrix(sample(500, 1e5, TRUE), ncol=10)
  27. Unit: microseconds
  28. expr min lq mean median uq max neval
  29. lex_max(a) 60.2 97.5 137.462 120.0 164.40 650.5 100
  30. OP 594.0 775.9 1359.959 966.8 1251.35 14719.9 100
  31. lexmaxrow(a) 475.1 624.1 1013.927 769.5 936.60 11775.1 100
  32. 在所有情况下,`lex_max` 函数的性能均快约10
  33. ### 编辑:
  34. 如果您需要位置,可以简单地执行:
  35. which_lexmax <- function(M,i=1, b = seq_len(nrow(M))){
  36. d <- dim(M)
  37. if(d[1] == 1 | i > d[2])b[1]
  38. else lex_max(M[mx <- max(M[,i]) == M[,i],,drop = FALSE], i+1, b[mx])
  39. }
  40. which_lexmax(a)
英文:

You couls write a recursive function that works faster:

  1. lex_max &lt;- function(M,i=1){
  2. d &lt;- dim(M)
  3. if(d[1] == 1 | i &gt; d[2])M[1,]
  4. else lex_max(M[max(M[,i]) == M[,i],,drop = FALSE], i+1)
  5. }
  6. a &lt;- matrix(sample(500, 1e5, TRUE), 10)

The timings:

Large number of columns:

  1. microbenchmark::microbenchmark(lex_max(a),
  2. OP=lexsort(a)[nrow(a),],lexmaxrow(a), check = &#39;equal&#39;)
  3. Unit: microseconds
  4. expr min lq mean median uq max neval
  5. lex_max(a) 45.7 57.95 80.442 65.55 87.45 306.4 100
  6. OP 15819.5 19437.25 25579.100 22246.55 29002.80 68948.8 100
  7. lexmaxrow(a) 16393.9 18739.65 25210.846 22022.40 29098.75 47731.1 100

  1. a &lt;- matrix(sample(500, 1e5, TRUE), 500)
  2. Unit: microseconds
  3. expr min lq mean median uq max neval
  4. lex_max(a) 5.7 9.90 93.152 12.85 19.70 7124.0 100
  5. OP 577.0 629.75 907.524 699.20 1017.70 7771.4 100
  6. lexmaxrow(a) 470.5 521.05 875.526 619.45 908.85 10049.1 100

Large number of rows

  1. a &lt;- matrix(sample(500, 1e5, TRUE), ncol=10)
  2. Unit: microseconds
  3. expr min lq mean median uq max neval
  4. lex_max(a) 60.2 97.5 137.462 120.0 164.40 650.5 100
  5. OP 594.0 775.9 1359.959 966.8 1251.35 14719.9 100
  6. lexmaxrow(a) 475.1 624.1 1013.927 769.5 936.60 11775.1 100

In all the instances the lex_max function performs >~10x faster

Edit:

If you need the position, you could simply do:

  1. which_lexmax &lt;- function(M,i=1, b = seq_len(nrow(M))){
  2. d &lt;- dim(M)
  3. if(d[1] == 1 | i &gt; d[2])b[1]
  4. else lex_max(M[mx &lt;- max(M[,i]) == M[,i],,drop = FALSE], i+1, b[mx])
  5. }
  6. which_lexmax(a)

答案2

得分: 2

Update

如果您想要得到具有最高词典顺序的行 行索引,您可以尝试以下代码

  1. lex_max2 <- function(M) {
  2. i <- 1
  3. nc <- ncol(M)
  4. idx <- 1:nrow(M)
  5. repeat {
  6. p <- max(M[, i]) == M[, i]
  7. idx <- idx

  8. M <- M

  9. if (length(idx) == 1) {

  10. return(list(lexmaxval = M, index = idx))

  11. } else {

  12. i <- i + 1

  13. }

  14. }

  15. }

一个示例是

  1. > M <- rbind(c(1, 2, 3), c(1, 2, 2), c(2, 3, 2), c(2, 2, 3))
  2. > lex_max2(M)
  3. $lexmaxval
  4. [1] 2 3 2
  5. $index
  6. [1] 3

Previous Solution

Onyambu的回答 类似,但使用repeat而不是递归

  1. lex_max2 <- function(M) {
  2. i <- 1
  3. nc <- ncol(M)
  4. repeat {
  5. M <- M[max(M[, i]) == M[, i], ]
  6. if (length(M) == nc) {
  7. return(M)
  8. } else {
  9. i <- i + 1
  10. }
  11. }
  12. }

你可以看到稍微提高了速度

  1. > set.seed(0)
  2. > M1 <- matrix(sample(500, 1e6, TRUE), ncol = 100)
  3. > microbenchmark(
  4. + lex_max(M1),
  5. + lex_max2(M1),
  6. + check = "equal"
  7. + )
  8. Unit: microseconds
  9. expr min lq mean median uq max neval
  10. lex_max(M1) 67.3 88.10 154.193 90.45 101.30 5700.1 100
  11. lex_max2(M1) 60.0 85.75 94.461 87.80 97.15 158.9 100
  12. > M2 <- matrix(sample(500, 1e7, TRUE), 500)
  13. > microbenchmark(
  14. + lex_max(M2),
  15. + lex_max2(M2),
  16. + check = "equal"
  17. + )
  18. Unit: microseconds
  19. expr min lq mean median uq max neval
  20. lex_max(M2) 135.9 187.00 232.651 200.25 254.95 597.1 100
  21. lex_max2(M2) 89.6 113.15 152.559 126.20 159.40 528.7 100
英文:

Update

If you want both the row with highest lexicographical order and the row index, you can try the code below

  1. lex_max2 &lt;- function(M) {
  2. i &lt;- 1
  3. nc &lt;- ncol(M)
  4. idx &lt;- 1:nrow(M)
  5. repeat {
  6. p &lt;- max(M[, i]) == M[, i]
  7. idx &lt;- idx

  8. M &lt;- M

  9. if (length(idx) == 1) {

  10. return(list(lexmaxval = M, index = idx))

  11. } else {

  12. i &lt;- i + 1

  13. }

  14. }

  15. }

and an example is

  1. &gt; M &lt;- rbind(c(1, 2, 3), c(1, 2, 2), c(2, 3, 2), c(2, 2, 3))
  2. &gt; lex_max2(M)
  3. $lexmaxval
  4. [1] 2 3 2
  5. $index
  6. [1] 3

Previous Solution

Similar idea to Onyambu's answer but using repeat rather than recursion

  1. lex_max2 &lt;- function(M) {
  2. i &lt;- 1
  3. nc &lt;- ncol(M)
  4. repeat {
  5. M &lt;- M[max(M[, i]) == M[, i], ]
  6. if (length(M) == nc) {
  7. return(M)
  8. } else {
  9. i &lt;- i + 1
  10. }
  11. }
  12. }

and you can see a bit speed improvement

  1. &gt; set.seed(0)
  2. &gt; M1 &lt;- matrix(sample(500, 1e6, TRUE), ncol = 100)
  3. &gt; microbenchmark(
  4. + lex_max(M1),
  5. + lex_max2(M1),
  6. + check = &quot;equal&quot;
  7. + )
  8. Unit: microseconds
  9. expr min lq mean median uq max neval
  10. lex_max(M1) 67.3 88.10 154.193 90.45 101.30 5700.1 100
  11. lex_max2(M1) 60.0 85.75 94.461 87.80 97.15 158.9 100
  12. &gt; M2 &lt;- matrix(sample(500, 1e7, TRUE), 500)
  13. &gt; microbenchmark(
  14. + lex_max(M2),
  15. + lex_max2(M2),
  16. + check = &quot;equal&quot;
  17. + )
  18. Unit: microseconds
  19. expr min lq mean median uq max neval
  20. lex_max(M2) 135.9 187.00 232.651 200.25 254.95 597.1 100
  21. lex_max2(M2) 89.6 113.15 152.559 126.20 159.40 528.7 100

huangapple
  • 本文由 发表于 2023年7月13日 12:37:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76675972.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定