2023年2月13日 23:43:59go评论72阅读模式

英文:

Is there an efficient, time saving method of maintaining a heap in which elements are removed in the middle?

问题

以下是你要翻译的代码部分：

I'm working on a path planning program where I have a priority queue 'U':

using HeapKey = pair<float, float>;
vector<pair<HeapKey, unsigned int>> U;

I order and maintain my priority queue as a binary min-heap (aka. the cheapest node first in the queue) using greater as my comparison function to get a min-heap (maybe not important). While the program is executing and planning a path it is adding nodes to 'U' with push_back() followed by push_heap() to get that node into the correct order and everything is working fine there...

However, the algorithm I'm using calls for sometimes updating a node already present in 'U' with new values. It does this by removing it from 'U' (I find it with find_if() and remove it with erase(), if that's important) and then call the function to re-insert (again push_back() followed by push_heap()) so the node has its updated values.

This has proved a bit of an unexpected problem for me. I'm no expert at this, but as far as I've been able to think, since the node is removed some place INSIDE 'U' then it messes up the order of the heap. I've been able to get the program to work by using make_heap() after the node is removed. However, this solution brought another issue since the program now takes a lot more time to complete, longer the larger my map/nodes in the heap, presumably because make_heap() is re-organizing/iterating through the entire heap every time I update a node, thus slowing down the overall planning.

Sadly I don't have time to change my program and get new results, I can only make use of simple, easy solutions I can implement fast. I'm mostly here to learn and perhaps see if there are some suggestions I can pass on about how to solve this issue of efficiently maintaining a heap/priority queue when you aren't just removing the first or last elements but elements maybe in the middle. Reducing the time taken to plan is the only thing I am missing.

Attempt at minimal reproducible example without going into the actual algorithm and such:

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

using Cost = float;
using HeapKey = pair<Cost, Cost>;
pair<Cost, Cost> PAIR1;
vector<pair<HeapKey, unsigned int>> U;
using KeyCompare = std::greater<std::pair<HeapKey, unsigned int>>;
int in_U[20];

ostream& operator<<(ostream& os, pair<Cost, Cost> const& p) {
  return os << "<" << p.first << ", " << p.second << ">";
}

bool has_neighbor(unsigned int id) {
    if ((in_U[id + 1]) && (in_U[id - 1])) {
        return true;
    }
    return false;
}

void insert(unsigned int id, HeapKey k) {
    U.push_back({k, id});
    push_heap(U.begin(), U.end(), KeyCompare());
    in_U[id]++;
}

void update(unsigned int id) {
    Cost x;
    Cost y;
    if (id != 21) { // let's say 21 is the goal
        x = U[id].first.first;
        y = U[id].first.second;
    }
    if (in_U[id]) {
        auto it = find_if(U.begin(), U.end(), [=](auto p) { return p.second == id; });
        U.erase(it);
        make_heap(U.begin(), U.end(), KeyCompare());
        in_U[id]--;
    }
    int r1 = rand() % 10 + 1;
    int r2 = rand() % 10 + 1;
    if (x != y) {
      insert(id, {x + r1, y + r2});
    }
}

int main() {

    U.push_back({{8, 2}, 1});
    in_U[1]++;
    U.push_back({{5, 1}, 2});
    in_U[2]++;
    U.push_back({{6, 1}, 3});
    in_U[3]++;
    U.push_back({{6, 5}, 4});
    in_U[4]++;
    U.push_back({{2, 3}, 5});
    in_U[5]++;
    U.push_back({{2, 9}, 6});
    in_U[6]++;
    U.push_back({{9, 2}, 7});
    in_U[7]++;
    U.push_back({{4, 7}, 8});
    in_U[8]++;
    U.push_back({{11, 4}, 9});
    in_U[9]++;
    U.push_back({{2, 2}, 10});
    in_U[10]++;
    U.push_back({{1, 2}, 11});
    in_U[11]++;
    U.push_back({{7, 2}, 12});
    in_U[12]++;
    make_heap(U.begin(), U.end(), KeyCompare());
    PAIR1.first = 14;
    PAIR1.second = 6;

    while (U.front().first < PAIR1) {

        cout << "Is_heap?: " << is_heap(U.begin(), U.end(), KeyCompare()) << endl;

        cout << "U: ";
        for (auto p : U) {
            cout << p.second << p.first << " - ";
        }
        cout << endl;

        auto uid = U.front().second;
        pop_heap(U.begin(), U.end(), KeyCompare());
        U.pop_back();

        if (has_neighbor(uid)) {
            update(uid - 1);
            update(uid + 1);
        }
    }
    //getchar();
}

希望这能帮助你理解代码。如果你需要进一步的解释或有其他问题，请随时提出。

英文:

I'm working on a path planning program where I have a priority queue 'U':

using HeapKey = pair&lt;float, float&gt;;
vector&lt;pair&lt;HeapKey, unsigned int&gt;&gt; U;

I order and maintain my priority queue as a binary min-heap (aka. the cheapest node first in the queue) using greater as my comparison function to get a min-heap (maybe not important). While the program is executing and planning a path it is adding nodes to 'U' with push_back() followed by push_heap() to get that node into the correct order and everything is working fine there...

However, the algorithm I'm using calls for sometimes updating a node already present in 'U' with new values. It does this by removing it from 'U' (I find it with find_if() and remove it with erase(), if that's important) and then call the function to re-insert (again push_back() followed by push_heap()) so the node have its updated values.

This have proved a bit of an unexpected problem for me. I'm no expert at this, but as far as I've been able to think, since the node is removed some place INSIDE 'U' then it messes up the order of the heap. I've been able to get the program to work by using make_heap() after the node is removed. However, this solution brought another issue since the program now takes a lot more time to complete, longer the larger my map/nodes in the heap, presumably because make_heap() is re-organizing/iterating through the entire heap every time I update a node, thus slowing down the overall planning.

Sadly I don't have time to change my program and get new results, I can only make use of simple, easy solutions I can implement fast. I'm mostly here to learn and perhaps see if there are some suggestions I can pass on about how to solve this issue of efficiently maintaining a heap/priority queue when you aren't just removing the first or last elements but elements maybe in the middle. Reducing the time taken to plan is the only thing I am missing.

Attempt at minimal reproducible example without going into the actual algorithm and such:

#include &lt;iostream&gt;
#include &lt;vector&gt;
#include &lt;algorithm&gt;
using namespace std;
using Cost = float;
using HeapKey = pair&lt;Cost, Cost&gt;;
pair&lt;Cost, Cost&gt; PAIR1;
vector&lt;pair&lt;HeapKey, unsigned int&gt;&gt; U;
using KeyCompare = std::greater&lt;std::pair&lt;HeapKey, unsigned int&gt;&gt;;
int in_U[20];
ostream&amp; operator&lt;&lt;(ostream&amp; os, pair&lt;Cost, Cost&gt; const&amp; p) {
return os &lt;&lt; &quot;&lt;&quot; &lt;&lt; p.first &lt;&lt; &quot;, &quot; &lt;&lt; p.second &lt;&lt; &quot;&gt;&quot;;
}
bool has_neightbor(unsigned int id) {
if ( (in_U[id+1]) &amp;&amp; (in_U[id-1])) {
return true;
}
return false;
}
void insert(unsigned int id, HeapKey k) {
U.push_back({ k, id });
push_heap(U.begin(), U.end(), KeyCompare());
in_U[id]++;
}
void update(unsigned int id) {
Cost x;
Cost y;
if (id != 21) { //lets say 21 is the goal
x = U[id].first.first;
y = U[id].first.second;
}
if (in_U[id]) {
auto it = find_if(U.begin(), U.end(), [=](auto p) { return p.second == id; });
U.erase(it);
make_heap(U.begin(), U.end(), KeyCompare());
in_U[id]--;
}
int r1 = rand() % 10 + 1;
int r2 = rand() % 10 + 1;
if (x != y) {
insert(id, {x + r1, y + r2});
}
}
int main() {
U.push_back({ {8, 2}, 1 });
in_U[1]++;
U.push_back({ {5, 1}, 2 });
in_U[2]++;
U.push_back({ {6, 1}, 3 });
in_U[3]++;
U.push_back({ {6, 5}, 4 });
in_U[4]++;
U.push_back({ {2, 3}, 5 });
in_U[5]++;
U.push_back({ {2, 9}, 6 });
in_U[6]++;
U.push_back({ {9, 2}, 7 });
in_U[7]++;
U.push_back({ {4, 7}, 8 });
in_U[8]++;
U.push_back({ {11, 4}, 9 });
in_U[9]++;
U.push_back({ {2, 2}, 10 });
in_U[10]++;
U.push_back({ {1, 2}, 11 });
in_U[11]++;
U.push_back({ {7, 2}, 12 });
in_U[12]++;
make_heap(U.begin(), U.end(), KeyCompare());
PAIR1.first = 14;
PAIR1.second = 6;
while (U.front().first &lt; PAIR1) {
cout &lt;&lt; &quot;Is_heap?: &quot; &lt;&lt; is_heap(U.begin(), U.end(), KeyCompare()) &lt;&lt; endl;
cout &lt;&lt; &quot;U: &quot;;
for (auto p : U) {
cout &lt;&lt; p.second &lt;&lt; p.first &lt;&lt; &quot; - &quot;;
}
cout &lt;&lt; endl;
auto uid = U.front().second;
pop_heap(U.begin(), U.end(), KeyCompare());
U.pop_back();
if (has_neightbor(uid)) {
update(uid - 1);
update(uid + 1);
}
}
//getchar();
}

答案1

得分: 3

是的，这个算法相对简单。请注意，考虑到索引i的项目时，它在堆中的“父”位于索引(i-1)/2，而它的子节点位于索引i*2+1和i*2+2。

将item_to_pop与范围中的最后一个项目交换。这将把该项目移到所需的（最后）位置，但在堆的中间插入了一个“小”项目。这需要修复。
如果item_to_pop位置的“小”项目大于其当前父项，则与其父项交换。重复，直到该项目不再大于其当前父项或成为新的根。然后我们完成了。值得注意的是，这与push_heap的算法相同，只是快捷方式是从中间开始而不是从末尾开始。
如果item_to_pop位置的“小”项目小于任一当前子项，则与较大子项交换。重复，直到该项目大于其当前子项中的任何一个（请注意，在接近末尾时，它可能只有一个或没有子项）。然后我们完成了。值得注意的是，这与pop_heap的算法相同，只是快捷方式是从中间开始而不是从顶部开始。

这个算法最多会进行log2(n)+1次交换和log2(n)*2+1次比较，使它几乎与pop_heap和push_heap一样快。这并不令人意外，因为它是相同的算法。

template <class RandomIt, class Compare>
constexpr void pop_mid_heap(RandomIt first, RandomIt last, RandomIt item_to_pop, Compare comp) {
    assert(std::is_heap(first, last)); //this is compiled out of release builds
    assert(first <= item_to_pop);
    assert(item_to_pop < last);
    using std::swap;
    std::size_t new_size = last - first - 1;
    if (new_size == 0) return;
    //swap the end of the range and item_to_pop, so that item_to_pop is at the end
    swap(*item_to_pop, *--last);
    if (new_size == 1) return;
    //If item_to_pop is bigger than it's parent, then swap with the parent
    bool moved_up = false;
    RandomIt swap_itr;
    while (true) {
        std::size_t offset = item_to_pop - first;
        if (offset == 0) break; //item_to_pop is at root: exit loop 
        swap_itr = first + (offset - 1) / 2;
        if (comp(*item_to_pop, *swap_itr)) 
            break; //item_to_pop smaller than it's parent: exit loop
        swap(*item_to_pop, *swap_itr); //swap with parent and repeat
        item_to_pop = swap_itr;
        moved_up = true;        
    }
    if (moved_up) return; //if we moved the item up, then heap is complete: exit
    //If biggest child is bigger than item_to_pop, then swap with that child
    while (true) {
        std::size_t offset = item_to_pop - first;
        std::size_t swap_idx = offset * 2 + 1;
        if (swap_idx >= new_size) break; //no children: exit loop
        swap_itr = first + swap_idx;
        if (swap_idx + 1 < new_size && comp(*swap_itr, *(swap_itr + 1))) //if right child exists and is bigger, swap that instead
            ++swap_itr;
        if (!comp(item_to_pop, swap_itr)) break; //item_to_pop bigger than biggest child: exit loop
        swap(*item_to_pop, *swap_itr); //swap with bigger child and repeat
        item_to_pop = swap_itr;
    }
}

template <class RandomIt>
constexpr void pop_mid_heap(RandomIt first, RandomIt last, RandomIt item_to_pop) {
    pop_mid_heap(first, last, item_to_pop, std::less<>{});
}

理论上，在push_heap部分可以优化掉“or is the new root”的检查，但检测到这种情况的检查增加了不值得的复杂性。

在我看来，这是有用的，应该成为C++标准库的一部分。

英文:

Yes, the algorithm is relatively simple. Note that when considering an item at index i, it's "parent" in a heap is at index (i-1)/2, and it's children are at indecies i*2+1 and i*2+2.

Swap item_to_pop for the last item in the range. This moves that item to the desired (last) position, but inserts a "small" item in the middle of the heap. This needs to be fixed.
If the "small" item at item_to_pop position is larger than it's current parent, then swap with it's parent. Repeat until that item is either no longer larger than it's current parent or is the new root. Then we're done. Notably, this is the same algorithm as push_heap, except with the shortcut that we start in the middle instead of at the end.
If the "small" item at item_to_pop position is smaller than either current child, then swap with the larger child. Repeat until that item is larger than any of its current children (note that near the end it might only have one or no children). Then we're done. Notably, this is the same algorithm as pop_heap, except with the shortcut that we start in the middle instead of at the top.

This algorithm will do at most log2(n)+1 swaps, and log2(n)*2+1 comparisons, making it almost as fast as pop_heap and push_heap. Which isn't really surprising since it's the same algorithm.


template&lt; class RandomIt, class Compare &gt;
constexpr void pop_mid_heap(RandomIt first, RandomIt last, RandomIt item_to_pop, Compare comp) {
assert(std::is_heap(first, last)); //this is compiled out of release builds
assert(first&lt;=item_to_pop);
assert(item_to_pop&lt;last);
using std::swap;
std::size_t new_size = last - first - 1;
if (new_size == 0) return;
//swap the end of the range and item_to_pop, so that item_to_pop is at the end
swap(*item_to_pop, *--last);
if (new_size == 1) return;
//If item_to_pop is bigger than it&#39;s parent, then swap with the parent
bool moved_up = false;
RandomIt swap_itr;
while (true) {
std::size_t offset = item_to_pop - first;
if (offset == 0) break; //item_to_pop is at root: exit loop 
swap_itr = first + (offset-1) / 2;
if (comp(*item_to_pop, *swap_itr)) 
break; //item_to_pop smaller than it&#39;s parent: exit loop
swap(*item_to_pop, *swap_itr); //swap with parent and repeat
item_to_pop = swap_itr;
moved_up = true;        
}
if (moved_up) return; //if we moved the item up, then heap is complete: exit
//If biggest child is bigger than item_to_pop, then swap with that child
while (true) {
std::size_t offset = item_to_pop - first;
std::size_t swap_idx = offset * 2 + 1;
if (swap_idx &gt;= new_size) break; //no children: exit loop
swap_itr = first + swap_idx;
if (swap_idx+1 &lt; new_size &amp;&amp; comp(*swap_itr, *(swap_itr+1))) //if right child exists and is bigger, swap that instead
++swap_itr;
if (!comp(item_to_pop, swap_itr)) break; //item_to_pop bigger than biggest child: exit loop
swap(*item_to_pop, *swap_itr); //swap with bigger child and repeat
item_to_pop = swap_itr;
}
}
template&lt; class RandomIt &gt;
constexpr void pop_mid_heap(RandomIt first, RandomIt last, RandomIt item_to_pop) {
pop_mid_heap(first, last, item_to_pop, std::less&lt;&gt;{});
}

https://ideone.com/zNW7h7

Theoretically one can optimize out the "or is the new root" check in the push_heap part, but the checks to detect that case adds complexity that doesn't seem worth it.

IMO, this is useful and should be part of the C++ standard library.

答案2

得分: 0

以下是翻译好的部分：

In general it's expensive to update a node in the middle of a binary heap not because the update operation is expensive but because finding the node is an O(n) operation.
If you know where the node is in the heap, updating its priority is very easy. My answer at https://stackoverflow.com/a/8706363/56778 shows how to delete a node.
Updating a node's priority is similar: rather than replacing the node with the last one in the heap, you just sift the node up or down as required.
If you want the ability to find a node quickly, then you have to build an indexed heap. Basically, you have a dictionary entry for each node. The dictionary key is the node's ID (or whatever you use to identify it), and the value is the node's index in the binary heap.
You modify the heap code so that it updates the dictionary entry whenever the node is moved around in the heap. It makes the heap a little bit slower (by a constant factor), but makes finding an arbitrary node an O(1) operation.
Or, you can replace the binary heap with a Pairing Heap, skip list, or any of the other "heap" types that work with node pointers.
My experience has been that although the theoretical performance of those two isn't as good as the theoretical performance of Fibonacci heap, the real-world performance is much better.
With either of those it's a whole lot easier to maintain an index: you just add a node reference to it when you add a node to the heap, and remove a reference when the node is removed from the heap.
Both of those heap types are easy to build and performance will be about the same as for a binary heap although they will use somewhat more memory.
From experience I'll say that Pairing heap is easier to build than skip list, but skip list is a more generally useful data structure.

英文:

In general it's expensive to update a node in the middle of a binary heap not because the update operation is expensive but because finding the node is an O(n) operation. If you know where the node is in the heap, updating its priority is very easy. My answer at https://stackoverflow.com/a/8706363/56778 shows how to delete a node. Updating a node's priority is similar: rather than replacing the node with the last one in the heap, you just sift the node up or down as required.

If you want the ability to find a node quickly, then you have to build an indexed heap. Basically, you have a dictionary entry for each node. The dictionary key is the node's ID (or whatever you use to identify it), and the value is the node's index in the binary heap. You modify the heap code so that it updates the dictionary entry whenever the node is moved around in the heap. It makes the heap a little bit slower (by a constant factor), but makes finding an arbitrary node an O(1) operation.

Or, you can replace the binary heap with a Pairing Heap, skip list, or any of the other "heap" types that work with node pointers. My experience has been that although the theoretical performance of those two isn't as good as the theoretical performance of Fibonacci heap, the real-world performance is much better.

With either of those it's a whole lot easier to maintain an index: you just add a node reference to it when you add a node to the heap, and remove a reference when the node is removed from the heap. Both of those heap types are easy to build and performance will be about the same as for a binary heap although they will use somewhat more memory. From experience I'll say that Pairing heap is easier to build than skip list, but skip list is a more generally useful data structure.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

有没有一种高效、省时的方法来维护一个堆，其中元素在中间位置被移除？

问题

答案1

答案2

Temporary object的寿命延长

如何以最快的方式在C++中创建/初始化自定义对象？

可以使用另一个模板和递归来生成一组模板类吗？

如何使用函数来结束循环？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论