最佳方式搜索大型有序序列是什么?

huangapple go评论63阅读模式
英文:

What is the optimal way to search a large ordered sequence

问题

Here's the translated code part you requested:

假设我有一个非常大的按时间顺序排列的值序列超过一百万个条目):

    case class A(ts: Long, value: Double)

    val sequence: Seq[A] = // 按时间顺序填充

并且我想要在给定时间之前搜索序列中发生的最新项目类似以下内容

    def lastBefore(time: Long) = sequence.indexWhere(_.ts >= time) match {
        case i if i > 0 => Some(sequence(i))
        case _ => None
    }

我想知道如何最快地实现lastBefore()我在考虑是否可以使用二分搜索方法但是否使用类似SortedMap或TreeMap的实现会有优势

Please note that code translation might not capture all nuances, so make sure to adapt it as needed.

英文:

Say I have an extremely large sequence (more than a million entries) of values that are in chronological order:

case class A(ts: Long, value: Double)

val sequence: Seq[A] = // Filled in chronological order

And I want to search for the latest item in the sequence that occurs before a given time. So something like the following:

def lastBefore(time: Long) = sequence.indexWhere(_.ts >= time) match {
	case i if i > 0 => Some(sequence(i))
	case _ => None
}

I am wondering what is the fastest way to implement lastBefore(). I was thinking I could use a binarySearch approach but am wondering would there be an advantage in using something like a SortedMap or TreeMap implementation?

答案1

得分: 3

Binary search is probably the best solution to efficiently find an item in a sorted collection while minimizing complexity. However, using a Seq is not necessarily the best choice, because a Seq does not give you any guarantee with regards to having the possibility of efficiently accessing items in the collection by its index (in fact, the default backing implementation of a Seq is a List, whose apply method is linear in complexity).

You can however use binary search on an IndexedSeq (like Vectors and Arrays) and you don't even have to implement it yourself, since binary search is the method used by default by the search method for IndexedSeqs. As the documentation says:

Search this sorted sequence for a specific element. If the sequence is an IndexedSeq, a binary search is used. Otherwise, a linear search is used.

Something like the following should suit your needs:

import scala.collection.Searching.Found
import scala.collection.Searching.InsertionPoint

final case class A(ts: Long, value: Double)

val sequence: IndexedSeq[A] = Vector(A(1,1), A(2,2), A(3,3), A(5,5))

def lastBefore(time: Long): Option[A] =
  sequence.search(A(time, 0))(math.Ordering.by(_.ts)) match {
  	case Found(0) | InsertionPoint(0) => None
  	case Found(foundIndex) => Some(sequence(foundIndex - 1))
  	case InsertionPoint(insertionPoint) => Some(sequence(insertionPoint - 1))
  }

assert(lastBefore(1) == None)
assert(lastBefore(2) == Some(A(1,1)))
assert(lastBefore(4) == Some(A(3,3)))
assert(lastBefore(10) == Some(A(5,5)))

You can play around with this code here on Scastie.

Notice that I got away with using search here using a dummy value as part of the search class you are using since it doesn't seem like it's considered for the search.

英文:

Binary search is probably the best solution to efficiently find an item in a sorted collection while minimizing complexity. However, using a Seq is not necessarily the best choice, because a Seq does not give you any guarantee with regards to having the possibility of efficiently accessing items in the collection by its index (in fact, the default backing implementation of a Seq is a List, whose apply method is linear in complexity).

You can however use binary search on an IndexedSeq (like Vectors and Arrays) and you don't even have to implement it yourself, since binary search is the method used by default by the search method for IndexedSeqs. As the documentation says:

> Search this sorted sequence for a specific element. If the sequence is an IndexedSeq, a binary search is used. Otherwise, a linear search is used.

Something like the following should suit your needs:

import scala.collection.Searching.Found
import scala.collection.Searching.InsertionPoint

final case class A(ts: Long, value: Double)

val sequence: IndexedSeq[A] = Vector(A(1,1), A(2,2), A(3,3), A(5,5))

def lastBefore(time: Long): Option[A] =
  sequence.search(A(time, 0))(math.Ordering.by(_.ts)) match {
  	case Found(0) | InsertionPoint(0) => None
  	case Found(foundIndex) => Some(sequence(foundIndex - 1))
  	case InsertionPoint(insertionPoint) => Some(sequence(insertionPoint - 1))
  }

assert(lastBefore(1) == None)
assert(lastBefore(2) == Some(A(1,1)))
assert(lastBefore(4) == Some(A(3,3)))
assert(lastBefore(10) == Some(A(5,5)))

You can play around with this code here on Scastie.

Notice that I got away with using search here using a dummy value as part of the search class you are using since it doesn't seem like it's considered for the search.

huangapple
  • 本文由 发表于 2023年5月20日 22:25:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76295710.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定