英文:
What is the optimal way to search a large ordered sequence
问题
Here's the translated code part you requested:
假设我有一个非常大的按时间顺序排列的值序列(超过一百万个条目):
case class A(ts: Long, value: Double)
val sequence: Seq[A] = // 按时间顺序填充
并且我想要在给定时间之前搜索序列中发生的最新项目。类似以下内容:
def lastBefore(time: Long) = sequence.indexWhere(_.ts >= time) match {
case i if i > 0 => Some(sequence(i))
case _ => None
}
我想知道如何最快地实现lastBefore()。我在考虑是否可以使用二分搜索方法,但是否使用类似SortedMap或TreeMap的实现会有优势?
Please note that code translation might not capture all nuances, so make sure to adapt it as needed.
英文:
Say I have an extremely large sequence (more than a million entries) of values that are in chronological order:
case class A(ts: Long, value: Double)
val sequence: Seq[A] = // Filled in chronological order
And I want to search for the latest item in the sequence that occurs before a given time. So something like the following:
def lastBefore(time: Long) = sequence.indexWhere(_.ts >= time) match {
case i if i > 0 => Some(sequence(i))
case _ => None
}
I am wondering what is the fastest way to implement lastBefore(). I was thinking I could use a binarySearch approach but am wondering would there be an advantage in using something like a SortedMap or TreeMap implementation?
答案1
得分: 3
Binary search is probably the best solution to efficiently find an item in a sorted collection while minimizing complexity. However, using a Seq
is not necessarily the best choice, because a Seq
does not give you any guarantee with regards to having the possibility of efficiently accessing items in the collection by its index (in fact, the default backing implementation of a Seq
is a List
, whose apply
method is linear in complexity).
You can however use binary search on an IndexedSeq
(like Vector
s and Array
s) and you don't even have to implement it yourself, since binary search is the method used by default by the search
method for IndexedSeq
s. As the documentation says:
Search this sorted sequence for a specific element. If the sequence is an IndexedSeq, a binary search is used. Otherwise, a linear search is used.
Something like the following should suit your needs:
import scala.collection.Searching.Found
import scala.collection.Searching.InsertionPoint
final case class A(ts: Long, value: Double)
val sequence: IndexedSeq[A] = Vector(A(1,1), A(2,2), A(3,3), A(5,5))
def lastBefore(time: Long): Option[A] =
sequence.search(A(time, 0))(math.Ordering.by(_.ts)) match {
case Found(0) | InsertionPoint(0) => None
case Found(foundIndex) => Some(sequence(foundIndex - 1))
case InsertionPoint(insertionPoint) => Some(sequence(insertionPoint - 1))
}
assert(lastBefore(1) == None)
assert(lastBefore(2) == Some(A(1,1)))
assert(lastBefore(4) == Some(A(3,3)))
assert(lastBefore(10) == Some(A(5,5)))
You can play around with this code here on Scastie.
Notice that I got away with using search
here using a dummy value as part of the search class you are using since it doesn't seem like it's considered for the search.
英文:
Binary search is probably the best solution to efficiently find an item in a sorted collection while minimizing complexity. However, using a Seq
is not necessarily the best choice, because a Seq
does not give you any guarantee with regards to having the possibility of efficiently accessing items in the collection by its index (in fact, the default backing implementation of a Seq
is a List
, whose apply
method is linear in complexity).
You can however use binary search on an IndexedSeq
(like Vector
s and Array
s) and you don't even have to implement it yourself, since binary search is the method used by default by the search
method for IndexedSeq
s. As the documentation says:
> Search this sorted sequence for a specific element. If the sequence is an IndexedSeq, a binary search is used. Otherwise, a linear search is used.
Something like the following should suit your needs:
import scala.collection.Searching.Found
import scala.collection.Searching.InsertionPoint
final case class A(ts: Long, value: Double)
val sequence: IndexedSeq[A] = Vector(A(1,1), A(2,2), A(3,3), A(5,5))
def lastBefore(time: Long): Option[A] =
sequence.search(A(time, 0))(math.Ordering.by(_.ts)) match {
case Found(0) | InsertionPoint(0) => None
case Found(foundIndex) => Some(sequence(foundIndex - 1))
case InsertionPoint(insertionPoint) => Some(sequence(insertionPoint - 1))
}
assert(lastBefore(1) == None)
assert(lastBefore(2) == Some(A(1,1)))
assert(lastBefore(4) == Some(A(3,3)))
assert(lastBefore(10) == Some(A(5,5)))
You can play around with this code here on Scastie.
Notice that I got away with using search
here using a dummy value as part of the search class you are using since it doesn't seem like it's considered for the search.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论