多级优先队列

huangapple go评论93阅读模式
英文:

Go multi level priority queue

问题

在Go语言中,container/heap包可以用作优先队列(PriorityQueue)-- https://pkg.go.dev/container/heap#example-package-PriorityQueue

是否有适用于多级优先队列的Go包?如果没有,如何自己编写一个?

所谓的"多级优先队列"是指:

  • 任务1:遍历所有学生的分数,并获取分数最高的前N名学生。这是典型的优先队列。
  • 任务2:遍历不同课程的学生分数,并获取前N门课程中最高的N个分数(假设课程数量大于N)。这就是我所说的"多级优先队列"。

示例结果可以是:

课程A:99 98 98
课程B:92 90 88
课程C:91 89 87

注意:

  1. 课程D:中最高的3个分数为90 89 88,但不在前3门课程中。

  2. 可能存在这样的情况,即没有足够的学生分数来填满所有的前N个最高分数。例如:

    课程E:85 82
    课程F:83
    课程G:82 80 78
    
  3. 根据进一步的要求,实际上:

    • 数据来自解析一个非常复杂和非常大的XML文件,因此我需要在单次遍历中遍历XML文件,这就是为什么我需要优先队列的原因。
    • XML文件实际上是SQL Server跟踪文件,其中包含数百甚至数千个SQL命令(SQL命令是课程,它们的持续时间是课程分数),这是我需要优先队列的第二个原因--只跟踪前几个。
英文:

In Go, the container/heap package can be used as a PriorityQueue --
https://pkg.go.dev/container/heap#example-package-PriorityQueue

Is there any Go package for multi level priority queue? If not, how to write one myself?

By "multi level priority queue", here is what I mean:

  • Task 1: run through all marks from students, and get the top N students with highest marks. This is typical PriorityQueue.
  • Task 2: run through all marks from students of different courses, and get the top N highest marks for top N course (assuming the number of courses is greater than N). This is the "multi level priority queue" that I'm talking about.

Sample result can be

course A: 99 98 98 
course B: 92 90 88
course C: 91 89 87

Notes,

  1. course D: with top 3 highest marks of 90 89 88 are not in top 3 courses.

  2. There might be cases that there isn't enough students marks to fill all top N highest marks. E.g.:

    course E: 85 82
    course F: 83
    course G: 82 80 78
    
  3. Further on the requirements, In reality,

    • the data come from parsing a super complicated and super large XML file, thus I need to walk the XML file in a single pass, that's why I need the priority queue.
    • the XML file is actually SQL Server Trace file, which contains hundreds or even thousands of SQL commands (the SQL commands being the courses, and their duration being course marks), that's the second reason that I need the priority queue -- to track only the top ones.

答案1

得分: 2

你不需要任何奇特的队列结构来解决这个问题。实际上,你甚至不需要使用优先队列,尽管它是执行选择第k个操作的一种简单方式。(如果你不需要输出按照彼此之间的顺序排序,而只需要按照某种顺序输出前N个,那么使用类似快速选择的选择算法更高效。)

对于任务2,你只需要遍历所有的分数,为每门课程构建最高分。完成这一步后,找出前N门课程。完成这一步后,再次遍历所有的分数,将属于这些课程的分数筛选到单独的容器中。然后在每个容器中执行选择第k个操作。

如果你使用优先队列,这种方法的运行时间为O(m + N^2 log N)(其中m是总分数的数量),如果你使用高效的选择算法,则运行时间为O(m + N^2)。后一种时间复杂度是该问题的最佳可能情况,因为需要检查O(m)个输入并生成O(N^2)个输出。

英文:

You don't need any sort of exotic queue structure to solve it. You don't even need a priority queue at all, though it's a simple way to do the select-k operation. (If you don't need your outputs to be sorted relative to each other but just be the top N in some order, it's more efficient to use a selection algorithm like quickselect.)

For task 2, you simply need to iterate through all marks, building the top mark for each course. Once you've done that, you find the top N courses. Once you've done that, iterate through all marks again, filtering the ones for those courses into separate containers. Then just do a select-k in each one.

The running time of this approach is O(m + N^2 log N) (where m is the total number of marks) if you use a priority queue, and O(m + N^2) if you use an efficient selection algorithm. The latter time bound is the best possible for the problem, because O(m) inputs need to be examined and O(N^2) outputs need to be generated.

huangapple
  • 本文由 发表于 2022年1月7日 12:52:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/70616746.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定