如何使用另一个向量对向量进行排序?

huangapple go评论86阅读模式
英文:

How can I order a vector using another vector?

问题

这个问题与这里提出的问题类似,但是这个答案对我的问题不起作用,而且稍有不同。

我试图做的最好通过代码来展示:

//这将是一个复制版本:
int main(){
   std::vector<uint32_t> order = {0,2,5,6,9,10,1,3,4,7,8,11};
   std::vector<uint32_t> example = {0,1,2,3,4,5,6,7,8,9,10,11};
   std::vector<uint32_t> output(order.size());
   for(uint32_t i = 0; i < order.size(); ++i){
       output[i] = example[order[i]];
   }
}
//输出为{0,2,5,6,9,10,1,3,4,7,8,11}

然而,当我尝试使用上面链接中的重新排序代码来实现原地版本时,代码如下:

void reorder(std::vector<uint32_t> &v, std::vector<uint32_t> const &order )  {   
    for ( int s = 1, d; s < order.size(); ++ s ) {
        for ( d = order
展开收缩
; d < s; d = order[d] ) ; if ( d == s ) while ( d = order[d], d != s ) std::swap( v
展开收缩
, v[d] ); } } int main(){ std::vector<uint32_t> order = {0,2,5,6,9,10,1,3,4,7,8,11}; std::vector<uint32_t> example = {0,1,2,3,4,5,6,7,8,9,10,11}; reorder(example, order); } //example = {0,6,1,7,8,2,3,9,10,4,5,11,}

我如何实现一个原地版本来完成我想要的目标,而不需要复制内存?

编辑

我想要澄清一些事情,代码中的向量example可以是任意元素,我只是为了方便检查而初始化它们的方式。这完全有效:

std::vector<uint32_t> example = {300,12,21,34,47,15,61,57,82,94,1,2};
  • orderexample将始终包含相同数量的元素
  • 不能保证orderexample存储相同的数据类型
  • 可以保证order存储唯一值
  • 可以保证order的数据类型始终为uint32_t
  • order的范围始终为0到n-1,其中nexample的大小,每个数字恰好出现一次,没有超出该范围的数字(就像在示例代码中一样)
  • 该范围的实际顺序完全是随机的,与偶数或奇数索引无关。
英文:

This question is similar to the one posed here however, this answer does not work for my question and it is slightly different.

What I am trying to do would best be shown in code:

//this would be a copy version:
int main(){
   std::vector&lt;uint32_t&gt; order = {0,2,5,6,9,10,1,3,4,7,8,11};
   std::vector&lt;uint32_t&gt; example = {0,1,2,3,4,5,6,7,8,9,10,11};
   std::vector&lt;uint32_t&gt; output(order.size());
   for(uint32_t i = 0; i &lt; order.size(); ++i){
       output[i] = example[order[i]];
   }
}
//output comes out to {0,2,5,6,9,10,1,3,4,7,8,11}

However, when I try to implement an in-place version using the reorder code from the link above as such:

void reorder(std::vector&lt;uint32_t&gt; &amp;v, std::vector&lt;uint32_t&gt; const &amp;order )  {   
    for ( int s = 1, d; s &lt; order.size(); ++ s ) {
        for ( d = order
展开收缩
; d &lt; s; d = order[d] ) ; if ( d == s ) while ( d = order[d], d != s ) std::swap( v
展开收缩
, v[d] ); } } int main(){ std::vector&lt;uint32_t&gt; order = {0,2,5,6,9,10,1,3,4,7,8,11}; std::vector&lt;uint32_t&gt; example = {0,1,2,3,4,5,6,7,8,9,10,11}; reorder(example, order); } //example = {0,6,1,7,8,2,3,9,10,4,5,11,}

How could I implement an in-place version of what I am trying to accomplish without copying memory?

Edit

I wanted to make something more clear, the vector example from the code can be arbitrary elements, I just happened to make them initialized the way I did for the ease of checking them. This would be entirely valid:

std::vector&lt;uint32_t&gt; example = {300,12,21,34,47,15,61,57,82,94,1,2};
  • order and example will always contain the same number of elements
  • It is not guaranteed that order and example store the same datatype
  • It is also guaranteed that order stores unique values
  • It is also guaranteed that order will always be the datatype uint32_t
  • order is always going to be in a range from 0 to n-1 where n is the size of example, Each of those numbers, exactly once, and nothing outside of that range. (like it does in the example code)
  • The actual order of that range is entirely random, has nothing to do with even or odd indices coming in any order.

答案1

得分: 3

我想到了一个简单的算法,可以在O(N)的时间复杂度和O(1)的额外存储空间下完成。不出所料,在这个过程中会破坏order的内容。我认为如果不增加时间复杂度或存储复杂度,你无法避免这一点。

基本的思路是逐个迭代顺序,将当前位置之前的每个位置视为已解决。对于已解决的位置,你将不会修改顺序或数据。

除了order是一个排序向量的平凡情况外,你必须将某个数据移出以便在该位置插入所需的值。有两种可能的情况:

  1. 值已经在正确的位置,或者在某个“未来”位置;
  2. 值指向一个“过去”的位置,因此我们知道它已经移动了一次或多次,现在位于一个“未来”位置。

因此,从非常基本的层面上讲,当你发现对于某个iorder[i]的值小于i时,你就知道它已经移动,并且移动到了位置order[order[i]]。从该位置,它可能再次被移动。通过应用完全相同的测试,你将得到一个大于或等于i的索引。那就是数据移动到的位置。

使其具有O(N)时间复杂度的秘密在于,在解析最终的“移动到”位置之后,你进行了最后一步操作,即用该新位置覆盖order[i]。这意味着你在该位置必须进行的任何搜索都不会重复。

现在,这个结果是线性时间复杂度的并不明显,所以如果你觉得有点费解,不要感到难过。我发现很难说服自己,在撰写这个答案之前,我实际上不得不通过寻找order的所有排列的最坏情况总搜索次数来进行实验验证算法的正确性。

推理实际上归结为一个事实,即单个值所做的“跳跃”越多,其他值可能的跳跃就越少。通过折叠任何搜索的结果,你确保该搜索在将来也是O(1)的。至关重要的是,这意味着链一次额外跳跃的过程也是O(1),因为对于相同的数据值,连续的“跳跃”将始终指向在数据移动时被展平的搜索。

以下是验证所有排列并计算执行的总搜索间接次数的测试工具:

这是明显的最坏情况下的O(N)搜索。如果你在reorder函数中注释掉order[i] = next;这一行,你会发现最坏情况是O(N^2)。

如果你使用一个最坏情况的排序(有许多最坏情况的排序),并取消搜索循环中的输出行的注释,你将看到为什么展平搜索很重要。

英文:

I've come up with a simple algorithm to do this in O(N) time with O(1) additional storage. Unsurprisingly, it does destroy the contents of order in the process. I don't think you can get around that without either increasing time complexity or storage complexity.

template &lt;typename T&gt;
void reorder(std::vector&lt;T&gt; &amp;v, std::vector&lt;uint32_t&gt; &amp;order)
{
    for (size_t i = 0; i &lt; v.size(); i++)
    {
        size_t next = order[i];      // find index of next output
        while (next &lt; i)             // resolve chain of moves
        {
            next = order[next];
        }
        std::swap(v[i], v[next]);    // exchange data
        order[i] = next;             // record where previous data moved to
    }
}

The basic approach is that you iterate through the ordering one at a time, and you consider every position prior to the current position as solved. You will never modify the ordering or the data for solved positions.

Except for the trivial case where order is a sorted vector, you will inevitably need to move a piece of data out of the way, so that you can bring in the required value at that position. There are two possible scenarios:

  1. The value is already in the right place, or is in some "future" position; or
  2. The value points at a "past" position, so we know it has been moved one or more times such that it's now in a "future" position.

So, on a very basic level, when you discover that for some i, the value of order[i] is less than i, then you know it was moved, and that it moved to the position order[order[i]]. From that position, it may have been moved again. You'll know, by applying exactly the same test, until you end up with an index that's greater than or equal to i. And that's where the data moved to.

The secret to making this O(N) is that, after resolving the final "moved-to" position, you do one final step which is to overwrite order[i] with that new position. That means whatever searching you had to do at this position will never be repeated.

Now, it's not at all obvious that this results in linear time complexity, so don't feel bad if it's a bit mind-bending. I found it hard to convince myself, and before writing this answer I actually had to prove it experimentally by looking for the worst-case total searches for all permutations of order. That was also useful to verify the algorithm's correctness.

The reasoning really boils down to the fact that the more "hops" a single value makes, the fewer hops are possible for other values. By collapsing the result of any search you ensure that search is also O(1) in future. Crucially, that means the process of chaining one extra hop is also O(1) because successive "hops" for the same data value will always point back to a search that was flattened when the data was moved.


Here's the little test harness that verifies all permutations and counts the total number of search indirections performed:

#include &lt;algorithm&gt;
#include &lt;cstdint&gt;
#include &lt;iostream&gt;
#include &lt;numeric&gt;
#include &lt;vector&gt;

typedef int Data;

template &lt;typename T&gt;
size_t reorder(std::vector&lt;T&gt; &amp;v, std::vector&lt;uint32_t&gt; &amp;order)
{
    size_t indirections = 0;
    for (size_t i = 0; i &lt; v.size(); i++)
    {
        size_t next = order[i];
        while (next &lt; i)
        {
            // std::cout &lt;&lt; &quot;search &quot; &lt;&lt; i &lt;&lt; &quot; : hop to &quot; &lt;&lt; next &lt;&lt; &quot;\n&quot;;
            next = order[next];
            indirections++;
        }
        std::swap(v[i], v[next]);
        order[i] = next;
    }
    return indirections;
}

size_t count_worst_case_indirections(size_t size)
{
    size_t max_indirections = 0;

    std::vector&lt;uint32_t&gt; order_perm(size);
    std::vector&lt;uint32_t&gt; order_worst;
    std::vector&lt;uint32_t&gt; order;
    std::vector&lt;Data&gt; data(size);
    std::vector&lt;Data&gt; expected;
    expected.reserve(size);

    // Test all possible orderings
    std::iota(order_perm.begin(), order_perm.end(), 0);
    do
    {
        // Reset initial data and generate expected result
        order = order_perm;
        std::iota(data.begin(), data.end(), 0);
        expected.clear();
        for (auto i : order) expected.push_back(data[i]);

        // Run test
        size_t indirections = reorder(data, order);
        if (indirections &gt; max_indirections)
        {
            max_indirections = indirections;
            order_worst = order_perm;
        }

        // Throw if result is invalid
        if (data != expected) throw &quot;ALGORITHM IS BROKEN&quot;;
    } while (std::next_permutation(order_perm.begin(), order_perm.end()));

    std::cerr &lt;&lt; &quot;worst order : &quot;;
    for (auto x : order_worst) std::cerr &lt;&lt; x &lt;&lt; &#39; &#39;;
    std::cerr &lt;&lt; &quot;\n&quot;;

    return max_indirections;
}

int main()
{
    for (size_t size = 1; size &lt; 12; size++)
    {
        size_t max_indirections = count_worst_case_indirections(size);
        std::cout &lt;&lt; &quot;Size &quot; &lt;&lt; size &lt;&lt; &quot; : &quot; &lt;&lt; max_indirections &lt;&lt; &quot;\n&quot;;
    }
}

stdout:

Size 1 : 0
Size 2 : 1
Size 3 : 2
Size 4 : 3
Size 5 : 4
Size 6 : 5
Size 7 : 6
Size 8 : 7
Size 9 : 8
Size 10 : 9
Size 11 : 10

stderr:

worst order : 0 
worst order : 1 0 
worst order : 1 2 0 
worst order : 1 2 3 0 
worst order : 1 2 3 4 0 
worst order : 1 2 3 4 5 0 
worst order : 1 2 3 4 5 6 0 
worst order : 1 2 3 4 5 6 7 0 
worst order : 1 2 3 4 5 6 7 8 0 
worst order : 1 2 3 4 5 6 7 8 9 0 
worst order : 1 2 3 4 5 6 7 8 9 10 0

This is clearly worst-case O(N) in searches. If you comment-out the line order[i] = next; in reorder, you will see that you get a worst-case of O(N<sup>2</sup>).

If you then set up a single-fire experiment with a worst-case ordering (there are many worst-case orderings), and uncomment the line of output in the search loop, you'll see exactly why it's important to flatten the search.

答案2

得分: 2

我在第一次尝试中误解了你的问题,所以我删除了那个答案并重新提交。

基本上,如果你需要避免复制数组,那也意味着你不能使用哈希表(map)来进行连续重新排序的解决方案。因此,就地重新排序成为一个O(N²)的运行时算法,除了已经分配的存储空间外,不需要额外的O(1)存储空间。

void reorder(std::vector<uint32_t>& items, std::vector<uint32_t>& order)
{

    // n-squared without additional storage
    for (uint32_t i = 0; i < (uint32_t)items.size(); i++)
    {
        if (order[i] == i)
        {
            continue;
        }

        // keep track of the displacement that's about to be done
        uint32_t displacedValue = items[i];
        uint32_t availableIndex = order[i];

        // swap
        items[i] = items[availableIndex];
        items[availableIndex] = displacedValue;

        // scan ahead in orders array to account for the swap
        for (size_t j = i + 1; j < items.size(); j++)
        {
            if (order[j] == i)
            {
                order[j] = availableIndex;
            }
        }
    }
}

注意,上面的代码会对`order`表进行排列和破坏。

使用你的示例进行概念验证:

int main() {
    std::vector<uint32_t> order = { 0,2,5,6,9,10,1,3,4,7,8,11 };
    std::vector<uint32_t> example = { 0,1,2,3,4,5,6,7,8,9,10,11 };

    reorder(example, order);

    // show the final sort order as applied
    for (size_t i = 0; i < example.size(); i++) {
        std::cout << example[i] << " ";
    }
    std::cout << std::endl;
    return 0;
}

上面的代码输出:0 2 5 6 9 10 1 3 4 7 8 11

另一种解决方案

如果你可以承受复制order而不复制example的存储成本,可以将上述代码转换为O(N)的运行时和存储空间的解决方案。

void reorder2(std::vector<uint32_t>& items, std::vector<uint32_t>& order) {

    // create a reverse lookup table on order
    std::unordered_map<uint32_t, uint32_t> reverseLookup; // reverse of order table
    for (uint32_t i = 0; i < (uint32_t)items.size(); i++)
    {
        reverseLookup[order[i]] = i;
    }

    for (uint32_t i = 0; i < (uint32_t)items.size(); i++)
    {
        if (order[i] == i)
        {
            continue;
        }

        // keep track of the displacement that's about to be done
        uint32_t displacedValue = items[i];
        uint32_t availableIndex = order[i];

        // swap
        items[i] = items[availableIndex];
        items[availableIndex] = displacedValue;

        // account for the swap
        uint32_t j = reverseLookup[i];
        order[j] = availableIndex;
        reverseLookup[availableIndex] = j;
    }
}
英文:

I misunderstood your question on my first attempt. So I deleted that answer and am resubmitting.

Basically, if you need to avoid a copy of the array, that also implies you couldn't use a hash-table (map) solution for the continous re-ordering. So reordering in place becomes an O(N&#178;) runtime algorithm with O(1) storage beyond what was already allocated.

void reorder(std::vector&lt;uint32_t&gt;&amp; items, std::vector&lt;uint32_t&gt;&amp; order)
{

    // n-squared without additional storage
    for (uint32_t i = 0; i &lt; (uint32_t)items.size(); i++)
    {
        if (order[i] == i)
        {
            continue;
        }

        // keep track of the displacement that&#39;s about to be done
        uint32_t displacedValue = items[i];
        uint32_t availableIndex = order[i];

        // swap
        items[i] = items[availableIndex];
        items[availableIndex] = displacedValue;

        // scan ahead in orders array to account for the swap
        for (size_t j = i + 1; j &lt; items.size(); j++)
        {
            if (order[j] == i)
            {
                order[j] = availableIndex;
            }
        }
    }
}

Note that the above code will permute and corrupt the `order` table.

Proof of concept using your example:

int main() {
    std::vector&lt;uint32_t&gt; order = { 0,2,5,6,9,10,1,3,4,7,8,11 };
    std::vector&lt;uint32_t&gt; example = { 0,1,2,3,4,5,6,7,8,9,10,11 };

    reorder(example, order);

    // show the final sort order as applied
    for (size_t i = 0; i &lt; example.size(); i++) {
        std::cout &lt;&lt; example[i] &lt;&lt; &quot; &quot;;
    }
    std::cout &lt;&lt; std::endl;
    return 0;
}

The above prints: 0 2 5 6 9 10 1 3 4 7 8 11

Alternate solution

If you can afford the storage cost of making a copy of order without copying example, converting the above code into an O(N) solution for both runtime and storage.

void reorder2(std::vector&lt;uint32_t&gt;&amp; items, std::vector&lt;uint32_t&gt;&amp; order) {

    // create a reverse lookup table on order
    std::unordered_map&lt;uint32_t, uint32_t&gt; reverseLookup; // reverse of order table
    for (uint32_t i = 0; i &lt; (uint32_t)items.size(); i++)
    {
        reverseLookup[order[i]] = i;
    }

    for (uint32_t i = 0; i &lt; (uint32_t)items.size(); i++)
    {
        if (order[i] == i)
        {
            continue;
        }

        // keep track of the displacement that&#39;s about to be done
        uint32_t displacedValue = items[i];
        uint32_t availableIndex = order[i];

        // swap
        items[i] = items[availableIndex];
        items[availableIndex] = displacedValue;

        // account for the swap
        uint32_t j = reverseLookup[i];
        order[j] = availableIndex;
        reverseLookup[availableIndex] = j;
    }
}

huangapple
  • 本文由 发表于 2023年8月9日 10:35:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76864224.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定