使用STL替换循环中的循环以在两个向量中查找匹配项

huangapple go评论101阅读模式
英文:

Replacing loop in a loop to find match in two vectors with STL

问题

struct bob
{
    int a {1};
    std::string b{"bob"};
};

std::vector<bob> list1{{1, "a"}, {2, "b"}, {3, "c"}};
std::vector<bob> list2{{1, "x"}, {2, "b"}, {3, "x"}};

return std::any_of(
    list1.begin(),
    list1.end(),
    [&](const auto &item1)
    {
        return std::any_of(
            list2.begin(),
            list2.end(),
            [&](const auto &item2) {
                std::cout << "stl check: " << item1.a << "," << item1.b << " == " << item2.a << "," << item2.b << "\n";
                return item2.a == item1.a &&
                       item2.b == item1.b;
            });
    });

在给定的代码结构中,您可以使用C++17中的std::any_of算法来查找匹配项,以取代嵌套的循环。上面是已经更新的代码,以使其更加清晰和表达力。

完整工作示例链接:https://godbolt.org/z/e8MYYK5G4

请注意,我已经去掉了HTML实体(如&quot;)以及不必要的&amp;符号,以便代码更易于阅读。

英文:

Given structure:

struct bob
{
    int a {1};
    std::string b{&quot;bob&quot;};
};

std::vector&lt;bob&gt; list1{{1, &quot;a&quot;}, {2, &quot;b&quot;}, {3, &quot;c&quot;}};
std::vector&lt;bob&gt; list2{{1, &quot;x&quot;}, {2, &quot;b&quot;}, {3, &quot;x&quot;}};

I want to find if a match exists. The classic way to do this is a loop in loop - something like:

    for (const auto &amp;item1 : list1)
    {
        for (const auto &amp;item2 : list2)
        {
            std::cout &lt;&lt; &quot;loop check: &quot; &lt;&lt; item1.a &lt;&lt; &quot;,&quot; &lt;&lt; item1.b &lt;&lt; &quot; == &quot;&lt;&lt; item2.a &lt;&lt; &quot;,&quot; &lt;&lt; item2.b &lt;&lt; &quot;\n&quot;;
            if (item2.a == item1.a &amp;&amp; 
                item2.b == item1.b)
            {
                return true;
            }
        }
    }

clangtidy modernisation features complains that using raw loops is not the way forward, so I have converted to an STL using std::any_of:

    return std::any_of(
        list1.begin(),
        list1.end(),
        [&amp;](const auto &amp;item1)
        {
            return std::any_of(
                list2.begin(),
                list2.end(),
                [&amp;](const auto &amp;item2) {
                    std::cout &lt;&lt; &quot;stl check: &quot; &lt;&lt; item1.a &lt;&lt; &quot;,&quot; &lt;&lt; item1.b &lt;&lt; &quot; == &quot;&lt;&lt; item2.a &lt;&lt; &quot;,&quot; &lt;&lt; item2.b &lt;&lt; &quot;\n&quot;;
                    return item2.a == item1.a &amp;&amp;
                           item2.b == item1.b;
                });
        });

But I can't help thinking this is harder to read and more code etc...

Is there a better arrangement, or different algorithm that will do this in a more clear/expressive way?

Here is the full working example: https://godbolt.org/z/e8MYYK5G4

Note: c++17 is being used

答案1

得分: 6

我会先为 `struct bob` 定义比较函数,这样代码的其余部分就不需要处理它的内部细节了。如果你使用的是 C++20 或更新版本,你可以使用太空船操作符来实现。否则,你可以定义 `operator&lt;`。

```cpp
struct bob {
    // bob 的现有内容...

    // C++20 或更新版本:
    // auto operator&lt;=&gt;(bob const &amp;) const = default;

    // C++17 或更早版本:
    bool operator&lt;(bob const &amp;other) const {
        if (a == other.a) return b &lt; other.b;
        return a &lt; other.a;
    }
};

然后,我们想要确定一个集合中的任何元素是否包含在另一个集合中。和 @chrysante 一样,我会先排序,然后进行搜索。但我会使用二分搜索而不是线性搜索。

标准库实际上包含了三种不同的函数模板来执行二分搜索。大多数情况下,我们想要使用 upper_boundlower_bound ——两者都会尝试在集合中找到一个特定元素的位置。但在这种情况下,我们并不关心具体位置——我们只需要一个布尔结果来告诉我们它是否存在。幸运的是,这正是 std::binary_search 所做的事情。

template &lt;class T, class U&gt;
bool intersects(T const &amp;a, U b) {
    std::sort(std::begin(b), std::end(b));

    return std::any_of(std::begin(a), std::end(a),
        [&amp;](auto const &amp;x) { return std::binary_search(std::begin(b), std::end(b), x); });
}

目前,我只定义了一个对输入进行排序的版本的函数。如果你可以修改原始数据,请将 b 作为引用传递。

取决于集合的大小以及预期的交集元素数量,这可能不是做事情的最有效方法。std::set_intersection 要求我们对两个输入进行排序,然后并行遍历它们以找到交集项。上述算法的复杂度是 O(M log N),其中 M 和 N 分别是两个集合的大小。

std::set_intersection 的复杂度是 O(M + N),但它也会继续搜索,以找到两个集合之间匹配的所有元素。因此,在任何给定情况下,哪个方法更快可能有些难以确定。

一旦我们有了这些,让我们尝试一些测试——一个应该产生积极结果,另一个应该产生消极结果:

int main() { 
    std::vector&lt;bob&gt; list1{{1, &quot;a&quot;}, {2, &quot;b&quot;}, {3, &quot;c&quot;}};
    std::vector&lt;bob&gt; list2{{1, &quot;x&quot;}, {2, &quot;b&quot;}, {3, &quot;x&quot;}};
    std::vector&lt;bob&gt; list3{{1, &quot;我不这么认为&quot;}, {5, &quot;&quot;}, { 17, &quot;没门&quot;}};

    std::cout &lt;&lt; &quot;1  2 是否相交? &quot; &lt;&lt; std::boolalpha &lt;&lt; intersects(list1, list2) &lt;&lt; &quot;\n&quot;;
    std::cout &lt;&lt; &quot;2  3 是否相交? &quot; &lt;&lt; std::boolalpha &lt;&lt; intersects(list2, list3) &lt;&lt; &quot;\n&quot;;

}

对于我来说,这产生了我所期望的结果:

1  2 是否相交? true
2  3 是否相交? false

<kbd>在 Godbolt 上查看</kbd>


<details>
<summary>英文:</summary>

I would start by defining comparison for `struct bob`, so the rest of the code doesn&#39;t need to deal with its internal details. If you&#39;re using C++20 or newer, you can use the spaceship operator to do that. Otherwise, you can define `operator&lt;` instead.

```cpp
struct bob {
    // existing contents of bob ...

    // C++20 or newer:
    // auto operator&lt;=&gt;(bob const &amp;) const = default;

    // C++17 or older:
    bool operator&lt;(bob const &amp;other) const {
        if (a == other.a) return b &lt; other.b;
        return a &lt; other.a;
    }
};

Then we want to determine whether any element of one collection is contained in the other collection. Like @chrysante, I'd sort and then search. But I'd use a binary search instead of a linear search.

The standard library actually contains three different function templates for doing binary searching. Most of the time, we want upper_bound or lower_bound--either will try to find a specific location of an element in a collection. But in this case, we don't care about a specific location--we just need a Boolean result to tell us whether it's there or not. Fortunately for us, that's exactly what std::binary_search does.

template &lt;class T, class U&gt;
bool intersects(T const &amp;a, U b) {
    std::sort(std::begin(b), std::end(b));

    return std::any_of(std::begin(a), std::end(a),
        [&amp;](auto const &amp;x) { return std::binary_search(std::begin(b), std::end(b), x); });
}

For the moment I've defined only a single version that sorts a copy of one inputs. If you can modify the original, pass b by reference instead.

Depending on the sizes of the collections, and the expected number of intersecting elements, this may not be the most efficient way to do things though. std::set_intersection requires that we sort both inputs, then traverses the two in parallel to find intersecting items. The algorithm above has complexity O(M log N) where M and N are the sizes of the two collections.

std::set_intersection has complexity O(M + N), but also continues searching, to find all elements that match between the two collections. So exactly which will be faster in any given situation is somewhat difficult to pin down.

Once we have those, let's try a couple of tests--one that should produce a positive result, the other a negative:

int main() { 
    std::vector&lt;bob&gt; list1{{1, &quot;a&quot;}, {2, &quot;b&quot;}, {3, &quot;c&quot;}};
    std::vector&lt;bob&gt; list2{{1, &quot;x&quot;}, {2, &quot;b&quot;}, {3, &quot;x&quot;}};
    std::vector&lt;bob&gt; list3{{1, &quot;I don&#39;t think so&quot;}, {5, &quot;nope&quot;}, { 17, &quot;no way&quot;}};

    std::cout &lt;&lt; &quot;1 and 2 intersect? &quot; &lt;&lt; std::boolalpha &lt;&lt; intersects(list1, list2) &lt;&lt; &quot;\n&quot;;
    std::cout &lt;&lt; &quot;2 and 3 intersect? &quot; &lt;&lt; std::boolalpha &lt;&lt; intersects(list2, list3) &lt;&lt; &quot;\n&quot;;

}

At least for me, this produces what I'd expect:

1 and 2 intersect? true
2 and 3 intersect? false

<kbd>Live on Godbolt</kbd>

答案2

得分: 1

我知道这不是关于clang-tidy的问题的答案,但为了更好的运行时性能,考虑这样做:

#include <algorithm>
#include <optional>
#include <tuple>

std::optional<bob> find_match_mutating(std::vector<bob>& a, std::vector<bob>& b) {
    auto less = [](auto& a, auto& b) {
        return std::tie(a.a, a.b) < std::tie(b.a, b.b);
    }; 
    auto equal = [](auto& a, auto& b) {
        return std::tie(a.a, a.b) == std::tie(b.a, b.b);
    }; 
    std::sort(a.begin(), a.end(), less);
    std::sort(b.begin(), b.end(), less);
    auto itr = std::find_if(a.begin(), a.end(), [&](bob const& aBob) mutable {
        j = std::lower_bound(j, b.end(), aBob, less);
        return j < b.end() && equal(*j, aBob);
    });
    if (itr != a.end()) {
        return *itr;
    }
    return std::nullopt;
}

// 或者如果你不想改变原始范围
std::optional<bob> find_match(std::vector<bob> a, std::vector<bob> b) {
    return find_match_mutating(a, b);
}

这样,您最终将得到O(Nlog(N) + Mlog(M))的时间复杂度,而不是O(N*M),其中N和M分别是ab的大小。

如果您想要元素在原始范围中的位置,请对它们进行线性搜索。您仍然会得到N*log(N)的时间复杂度。

对于足够大的范围,这肯定比您原来的O(N*M)解决方案更快。然而,对于小范围,其他因素可能主导执行时间。在这种情况下,您需要进行测量。

英文:

I know this does not answer your question about clang-tidy, but for better runtime performance consider doing this:

#include &lt;algorithm&gt;
#include &lt;optional&gt;
#include &lt;tuple&gt;

std::optional&lt;bob&gt; find_match_mutating(std::vector&lt;bob&gt;&amp; a, std::vector&lt;bob&gt;&amp; b) {
    auto less = [](auto&amp; a, auto&amp; b) {
        return std::tie(a.a, a.b) &lt; std::tie(b.a, b.b);
    }; 
    auto equal = [](auto&amp; a, auto&amp; b) {
        return std::tie(a.a, a.b) == std::tie(b.a, b.b);
    }; 
    std::sort(a.begin(), a.end(), less);
    std::sort(b.begin(), b.end(), less);
    auto itr = std::find_if(a.begin(), a.end(), [&amp;, j = b.begin()](bob const&amp; aBob) mutable {
        j = std::lower_bound(j, b.end(), aBob, less);
        return j &lt; b.end() &amp;&amp; equal(*j, aBob);
    });
    if (itr != a.end()) {
        return *itr;
    }
    return std::nullopt;
}

// Or if you don&#39;t want to mutate the original ranges
std::optional&lt;bob&gt; find_match(std::vector&lt;bob&gt; a, std::vector&lt;bob&gt; b) {
    return find_match_mutating(a, b);
}

This way you end up with O(N*log(N) + M*log(M)) time complixity instead of O(N*M), where N and M are the sizes a and b respectively.

If you want the position of the element in the original ranges, do a linear search for them. You still end up with N*log(N) time complexity.

For sufficiently large ranges this will guaranteed be faster than your original O(N*M) solution. For small ranges however other factors could dominate the execution time. In this case you would have to measure.

huangapple
  • 本文由 发表于 2023年7月27日 19:48:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76779431.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定