有没有一种高效、省时的方法来维护一个堆,其中元素在中间位置被移除?

huangapple go评论72阅读模式
英文:

Is there an efficient, time saving method of maintaining a heap in which elements are removed in the middle?

问题

以下是你要翻译的代码部分:

I'm working on a path planning program where I have a priority queue 'U':

using HeapKey = pair<float, float>;
vector<pair<HeapKey, unsigned int>> U;

I order and maintain my priority queue as a binary min-heap (aka. the cheapest node first in the queue) using greater as my comparison function to get a min-heap (maybe not important). While the program is executing and planning a path it is adding nodes to 'U' with push_back() followed by push_heap() to get that node into the correct order and everything is working fine there...

However, the algorithm I'm using calls for sometimes updating a node already present in 'U' with new values. It does this by removing it from 'U' (I find it with find_if() and remove it with erase(), if that's important) and then call the function to re-insert (again push_back() followed by push_heap()) so the node has its updated values.

This has proved a bit of an unexpected problem for me. I'm no expert at this, but as far as I've been able to think, since the node is removed some place INSIDE 'U' then it messes up the order of the heap. I've been able to get the program to work by using make_heap() after the node is removed. However, this solution brought another issue since the program now takes a lot more time to complete, longer the larger my map/nodes in the heap, presumably because make_heap() is re-organizing/iterating through the entire heap every time I update a node, thus slowing down the overall planning.

Sadly I don't have time to change my program and get new results, I can only make use of simple, easy solutions I can implement fast. I'm mostly here to learn and perhaps see if there are some suggestions I can pass on about how to solve this issue of efficiently maintaining a heap/priority queue when you aren't just removing the first or last elements but elements maybe in the middle. Reducing the time taken to plan is the only thing I am missing.
Attempt at minimal reproducible example without going into the actual algorithm and such:

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

using Cost = float;
using HeapKey = pair<Cost, Cost>;
pair<Cost, Cost> PAIR1;
vector<pair<HeapKey, unsigned int>> U;
using KeyCompare = std::greater<std::pair<HeapKey, unsigned int>>;
int in_U[20];

ostream& operator<<(ostream& os, pair<Cost, Cost> const& p) {
  return os << "<" << p.first << ", " << p.second << ">";
}

bool has_neighbor(unsigned int id) {
    if ((in_U[id + 1]) && (in_U[id - 1])) {
        return true;
    }
    return false;
}

void insert(unsigned int id, HeapKey k) {
    U.push_back({k, id});
    push_heap(U.begin(), U.end(), KeyCompare());
    in_U[id]++;
}

void update(unsigned int id) {
    Cost x;
    Cost y;
    if (id != 21) { // let's say 21 is the goal
        x = U[id].first.first;
        y = U[id].first.second;
    }
    if (in_U[id]) {
        auto it = find_if(U.begin(), U.end(), [=](auto p) { return p.second == id; });
        U.erase(it);
        make_heap(U.begin(), U.end(), KeyCompare());
        in_U[id]--;
    }
    int r1 = rand() % 10 + 1;
    int r2 = rand() % 10 + 1;
    if (x != y) {
      insert(id, {x + r1, y + r2});
    }
}

int main() {

    U.push_back({{8, 2}, 1});
    in_U[1]++;
    U.push_back({{5, 1}, 2});
    in_U[2]++;
    U.push_back({{6, 1}, 3});
    in_U[3]++;
    U.push_back({{6, 5}, 4});
    in_U[4]++;
    U.push_back({{2, 3}, 5});
    in_U[5]++;
    U.push_back({{2, 9}, 6});
    in_U[6]++;
    U.push_back({{9, 2}, 7});
    in_U[7]++;
    U.push_back({{4, 7}, 8});
    in_U[8]++;
    U.push_back({{11, 4}, 9});
    in_U[9]++;
    U.push_back({{2, 2}, 10});
    in_U[10]++;
    U.push_back({{1, 2}, 11});
    in_U[11]++;
    U.push_back({{7, 2}, 12});
    in_U[12]++;
    make_heap(U.begin(), U.end(), KeyCompare());
    PAIR1.first = 14;
    PAIR1.second = 6;

    while (U.front().first < PAIR1) {

        cout << "Is_heap?: " << is_heap(U.begin(), U.end(), KeyCompare()) << endl;

        cout << "U: ";
        for (auto p : U) {
            cout << p.second << p.first << " - ";
        }
        cout << endl;

        auto uid = U.front().second;
        pop_heap(U.begin(), U.end(), KeyCompare());
        U.pop_back();

        if (has_neighbor(uid)) {
            update(uid - 1);
            update(uid + 1);
        }
    }
    //getchar();
}

希望这能帮助你理解代码。如果你需要进一步的解释或有其他问题,请随时提出。

英文:

I'm working on a path planning program where I have a priority queue 'U':

using HeapKey = pair&lt;float, float&gt;;
vector&lt;pair&lt;HeapKey, unsigned int&gt;&gt; U;

I order and maintain my priority queue as a binary min-heap (aka. the cheapest node first in the queue) using greater as my comparison function to get a min-heap (maybe not important). While the program is executing and planning a path it is adding nodes to 'U' with push_back() followed by push_heap() to get that node into the correct order and everything is working fine there...

However, the algorithm I'm using calls for sometimes updating a node already present in 'U' with new values. It does this by removing it from 'U' (I find it with find_if() and remove it with erase(), if that's important) and then call the function to re-insert (again push_back() followed by push_heap()) so the node have its updated values.

This have proved a bit of an unexpected problem for me. I'm no expert at this, but as far as I've been able to think, since the node is removed some place INSIDE 'U' then it messes up the order of the heap. I've been able to get the program to work by using make_heap() after the node is removed. However, this solution brought another issue since the program now takes a lot more time to complete, longer the larger my map/nodes in the heap, presumably because make_heap() is re-organizing/iterating through the entire heap every time I update a node, thus slowing down the overall planning.

Sadly I don't have time to change my program and get new results, I can only make use of simple, easy solutions I can implement fast. I'm mostly here to learn and perhaps see if there are some suggestions I can pass on about how to solve this issue of efficiently maintaining a heap/priority queue when you aren't just removing the first or last elements but elements maybe in the middle. Reducing the time taken to plan is the only thing I am missing.


Attempt at minimal reproducible example without going into the actual algorithm and such:

#include &lt;iostream&gt;
#include &lt;vector&gt;
#include &lt;algorithm&gt;
using namespace std;
using Cost = float;
using HeapKey = pair&lt;Cost, Cost&gt;;
pair&lt;Cost, Cost&gt; PAIR1;
vector&lt;pair&lt;HeapKey, unsigned int&gt;&gt; U;
using KeyCompare = std::greater&lt;std::pair&lt;HeapKey, unsigned int&gt;&gt;;
int in_U[20];
ostream&amp; operator&lt;&lt;(ostream&amp; os, pair&lt;Cost, Cost&gt; const&amp; p) {
return os &lt;&lt; &quot;&lt;&quot; &lt;&lt; p.first &lt;&lt; &quot;, &quot; &lt;&lt; p.second &lt;&lt; &quot;&gt;&quot;;
}
bool has_neightbor(unsigned int id) {
if ( (in_U[id+1]) &amp;&amp; (in_U[id-1])) {
return true;
}
return false;
}
void insert(unsigned int id, HeapKey k) {
U.push_back({ k, id });
push_heap(U.begin(), U.end(), KeyCompare());
in_U[id]++;
}
void update(unsigned int id) {
Cost x;
Cost y;
if (id != 21) { //lets say 21 is the goal
x = U[id].first.first;
y = U[id].first.second;
}
if (in_U[id]) {
auto it = find_if(U.begin(), U.end(), [=](auto p) { return p.second == id; });
U.erase(it);
make_heap(U.begin(), U.end(), KeyCompare());
in_U[id]--;
}
int r1 = rand() % 10 + 1;
int r2 = rand() % 10 + 1;
if (x != y) {
insert(id, {x + r1, y + r2});
}
}
int main() {
U.push_back({ {8, 2}, 1 });
in_U[1]++;
U.push_back({ {5, 1}, 2 });
in_U[2]++;
U.push_back({ {6, 1}, 3 });
in_U[3]++;
U.push_back({ {6, 5}, 4 });
in_U[4]++;
U.push_back({ {2, 3}, 5 });
in_U[5]++;
U.push_back({ {2, 9}, 6 });
in_U[6]++;
U.push_back({ {9, 2}, 7 });
in_U[7]++;
U.push_back({ {4, 7}, 8 });
in_U[8]++;
U.push_back({ {11, 4}, 9 });
in_U[9]++;
U.push_back({ {2, 2}, 10 });
in_U[10]++;
U.push_back({ {1, 2}, 11 });
in_U[11]++;
U.push_back({ {7, 2}, 12 });
in_U[12]++;
make_heap(U.begin(), U.end(), KeyCompare());
PAIR1.first = 14;
PAIR1.second = 6;
while (U.front().first &lt; PAIR1) {
cout &lt;&lt; &quot;Is_heap?: &quot; &lt;&lt; is_heap(U.begin(), U.end(), KeyCompare()) &lt;&lt; endl;
cout &lt;&lt; &quot;U: &quot;;
for (auto p : U) {
cout &lt;&lt; p.second &lt;&lt; p.first &lt;&lt; &quot; - &quot;;
}
cout &lt;&lt; endl;
auto uid = U.front().second;
pop_heap(U.begin(), U.end(), KeyCompare());
U.pop_back();
if (has_neightbor(uid)) {
update(uid - 1);
update(uid + 1);
}
}
//getchar();
}

答案1

得分: 3

是的,这个算法相对简单。请注意,考虑到索引i的项目时,它在堆中的“父”位于索引(i-1)/2,而它的子节点位于索引i*2+1i*2+2

  1. item_to_pop与范围中的最后一个项目交换。这将把该项目移到所需的(最后)位置,但在堆的中间插入了一个“小”项目。这需要修复。
  2. 如果item_to_pop位置的“小”项目大于其当前父项,则与其父项交换。重复,直到该项目不再大于其当前父项或成为新的根。然后我们完成了。值得注意的是,这与push_heap的算法相同,只是快捷方式是从中间开始而不是从末尾开始。
  3. 如果item_to_pop位置的“小”项目小于任一当前子项,则与较大子项交换。重复,直到该项目大于其当前子项中的任何一个(请注意,在接近末尾时,它可能只有一个或没有子项)。然后我们完成了。值得注意的是,这与pop_heap的算法相同,只是快捷方式是从中间开始而不是从顶部开始。

这个算法最多会进行log2(n)+1次交换和log2(n)*2+1次比较,使它几乎与pop_heappush_heap一样快。这并不令人意外,因为它是相同的算法。

template <class RandomIt, class Compare>
constexpr void pop_mid_heap(RandomIt first, RandomIt last, RandomIt item_to_pop, Compare comp) {
    assert(std::is_heap(first, last)); //this is compiled out of release builds
    assert(first <= item_to_pop);
    assert(item_to_pop < last);
    using std::swap;
    std::size_t new_size = last - first - 1;
    if (new_size == 0) return;
    //swap the end of the range and item_to_pop, so that item_to_pop is at the end
    swap(*item_to_pop, *--last);
    if (new_size == 1) return;
    //If item_to_pop is bigger than it's parent, then swap with the parent
    bool moved_up = false;
    RandomIt swap_itr;
    while (true) {
        std::size_t offset = item_to_pop - first;
        if (offset == 0) break; //item_to_pop is at root: exit loop 
        swap_itr = first + (offset - 1) / 2;
        if (comp(*item_to_pop, *swap_itr)) 
            break; //item_to_pop smaller than it's parent: exit loop
        swap(*item_to_pop, *swap_itr); //swap with parent and repeat
        item_to_pop = swap_itr;
        moved_up = true;        
    }
    if (moved_up) return; //if we moved the item up, then heap is complete: exit
    //If biggest child is bigger than item_to_pop, then swap with that child
    while (true) {
        std::size_t offset = item_to_pop - first;
        std::size_t swap_idx = offset * 2 + 1;
        if (swap_idx >= new_size) break; //no children: exit loop
        swap_itr = first + swap_idx;
        if (swap_idx + 1 < new_size && comp(*swap_itr, *(swap_itr + 1))) //if right child exists and is bigger, swap that instead
            ++swap_itr;
        if (!comp(item_to_pop, swap_itr)) break; //item_to_pop bigger than biggest child: exit loop
        swap(*item_to_pop, *swap_itr); //swap with bigger child and repeat
        item_to_pop = swap_itr;
    }
}

template <class RandomIt>
constexpr void pop_mid_heap(RandomIt first, RandomIt last, RandomIt item_to_pop) {
    pop_mid_heap(first, last, item_to_pop, std::less<>{});
}

理论上,在push_heap部分可以优化掉“or is the new root”的检查,但检测到这种情况的检查增加了不值得的复杂性。

在我看来,这是有用的,应该成为C++标准库的一部分。

英文:

Yes, the algorithm is relatively simple. Note that when considering an item at index i, it's "parent" in a heap is at index (i-1)/2, and it's children are at indecies i*2+1 and i*2+2.

  1. Swap item_to_pop for the last item in the range. This moves that item to the desired (last) position, but inserts a "small" item in the middle of the heap. This needs to be fixed.
  2. If the "small" item at item_to_pop position is larger than it's current parent, then swap with it's parent. Repeat until that item is either no longer larger than it's current parent or is the new root. Then we're done. Notably, this is the same algorithm as push_heap, except with the shortcut that we start in the middle instead of at the end.
  3. If the "small" item at item_to_pop position is smaller than either current child, then swap with the larger child. Repeat until that item is larger than any of its current children (note that near the end it might only have one or no children). Then we're done. Notably, this is the same algorithm as pop_heap, except with the shortcut that we start in the middle instead of at the top.

This algorithm will do at most log2(n)+1 swaps, and log2(n)*2+1 comparisons, making it almost as fast as pop_heap and push_heap. Which isn't really surprising since it's the same algorithm.


template&lt; class RandomIt, class Compare &gt;
constexpr void pop_mid_heap(RandomIt first, RandomIt last, RandomIt item_to_pop, Compare comp) {
assert(std::is_heap(first, last)); //this is compiled out of release builds
assert(first&lt;=item_to_pop);
assert(item_to_pop&lt;last);
using std::swap;
std::size_t new_size = last - first - 1;
if (new_size == 0) return;
//swap the end of the range and item_to_pop, so that item_to_pop is at the end
swap(*item_to_pop, *--last);
if (new_size == 1) return;
//If item_to_pop is bigger than it&#39;s parent, then swap with the parent
bool moved_up = false;
RandomIt swap_itr;
while (true) {
std::size_t offset = item_to_pop - first;
if (offset == 0) break; //item_to_pop is at root: exit loop 
swap_itr = first + (offset-1) / 2;
if (comp(*item_to_pop, *swap_itr)) 
break; //item_to_pop smaller than it&#39;s parent: exit loop
swap(*item_to_pop, *swap_itr); //swap with parent and repeat
item_to_pop = swap_itr;
moved_up = true;        
}
if (moved_up) return; //if we moved the item up, then heap is complete: exit
//If biggest child is bigger than item_to_pop, then swap with that child
while (true) {
std::size_t offset = item_to_pop - first;
std::size_t swap_idx = offset * 2 + 1;
if (swap_idx &gt;= new_size) break; //no children: exit loop
swap_itr = first + swap_idx;
if (swap_idx+1 &lt; new_size &amp;&amp; comp(*swap_itr, *(swap_itr+1))) //if right child exists and is bigger, swap that instead
++swap_itr;
if (!comp(item_to_pop, swap_itr)) break; //item_to_pop bigger than biggest child: exit loop
swap(*item_to_pop, *swap_itr); //swap with bigger child and repeat
item_to_pop = swap_itr;
}
}
template&lt; class RandomIt &gt;
constexpr void pop_mid_heap(RandomIt first, RandomIt last, RandomIt item_to_pop) {
pop_mid_heap(first, last, item_to_pop, std::less&lt;&gt;{});
}

https://ideone.com/zNW7h7

Theoretically one can optimize out the "or is the new root" check in the push_heap part, but the checks to detect that case adds complexity that doesn't seem worth it.

IMO, this is useful and should be part of the C++ standard library.

答案2

得分: 0

以下是翻译好的部分:

  • In general it's expensive to update a node in the middle of a binary heap not because the update operation is expensive but because finding the node is an O(n) operation.
  • If you know where the node is in the heap, updating its priority is very easy. My answer at https://stackoverflow.com/a/8706363/56778 shows how to delete a node.
  • Updating a node's priority is similar: rather than replacing the node with the last one in the heap, you just sift the node up or down as required.
  • If you want the ability to find a node quickly, then you have to build an indexed heap. Basically, you have a dictionary entry for each node. The dictionary key is the node's ID (or whatever you use to identify it), and the value is the node's index in the binary heap.
  • You modify the heap code so that it updates the dictionary entry whenever the node is moved around in the heap. It makes the heap a little bit slower (by a constant factor), but makes finding an arbitrary node an O(1) operation.
  • Or, you can replace the binary heap with a Pairing Heap, skip list, or any of the other "heap" types that work with node pointers.
  • My experience has been that although the theoretical performance of those two isn't as good as the theoretical performance of Fibonacci heap, the real-world performance is much better.
  • With either of those it's a whole lot easier to maintain an index: you just add a node reference to it when you add a node to the heap, and remove a reference when the node is removed from the heap.
  • Both of those heap types are easy to build and performance will be about the same as for a binary heap although they will use somewhat more memory.
  • From experience I'll say that Pairing heap is easier to build than skip list, but skip list is a more generally useful data structure.
英文:

In general it's expensive to update a node in the middle of a binary heap not because the update operation is expensive but because finding the node is an O(n) operation. If you know where the node is in the heap, updating its priority is very easy. My answer at https://stackoverflow.com/a/8706363/56778 shows how to delete a node. Updating a node's priority is similar: rather than replacing the node with the last one in the heap, you just sift the node up or down as required.

If you want the ability to find a node quickly, then you have to build an indexed heap. Basically, you have a dictionary entry for each node. The dictionary key is the node's ID (or whatever you use to identify it), and the value is the node's index in the binary heap. You modify the heap code so that it updates the dictionary entry whenever the node is moved around in the heap. It makes the heap a little bit slower (by a constant factor), but makes finding an arbitrary node an O(1) operation.

Or, you can replace the binary heap with a Pairing Heap, skip list, or any of the other "heap" types that work with node pointers. My experience has been that although the theoretical performance of those two isn't as good as the theoretical performance of Fibonacci heap, the real-world performance is much better.

With either of those it's a whole lot easier to maintain an index: you just add a node reference to it when you add a node to the heap, and remove a reference when the node is removed from the heap. Both of those heap types are easy to build and performance will be about the same as for a binary heap although they will use somewhat more memory. From experience I'll say that Pairing heap is easier to build than skip list, but skip list is a more generally useful data structure.

huangapple
  • 本文由 发表于 2023年2月13日 23:43:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75438125.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定