std::unordered_map是否是一个好的选择来记录我是否已经“处理过”一个对象?

huangapple go评论60阅读模式
英文:

Is std::unordered_map a good candidate for recording if I have already "dealt with" an object?

问题

我有一个叫做Col的结构体:

struct Col
{
   Object* o1;
   Object* o2;
   Object* o3;

   bool operator==(const Col &other) const
   { 
       return o1 == o1 && o2 == o2 && o3 == o3;
   }
};

我在这个结构体上定义了一个哈希函数:

template <>
struct std::hash<Col>
{
  std::size_t operator()(const Col& k) const
  {
    using std::size_t;

    std::hash<void> void_hash;

    std::size_t res = void_hash(k.o1) + void_hash(k.o2) + void_hash(k.o3);

    return res;
  }
};

然后我有一个包含重复项(根据我的相等定义)的长std::vector<Col>对象,我希望只处理每个唯一的元素一次。

这是我的做法:

std::unordered_map<Col, bool> done_list;

for (Col c : col_list)
{
    if (done_list.find(c) == done_list.end())
    {
        process(c);

        done_list[c] = true;
    }
    else
    {
        continue;
    }
}

这种方法合理吗?使用稀疏矩阵是否更好?
英文:

I have a structure called Col:

struct Col
{
   Object* o1;
   Object* o2;
   Object* o3;

   bool operator==(const Col &amp;other) const
   { 
	   return o1 == o1 &amp;&amp; o2 == o2 &amp;&amp; o3 == o3;
   }
};

I define a hash on this:

template &lt;&gt;
struct std::hash&lt;Col&gt;
{
  std::size_t operator()(const Col&amp; k) const
  {
    using std::size_t;

	std::hash&lt;void&gt; void_hash;

    std::size_t res = void_hash(k.o1) + void_hash(k.o2) + void_hash(k.o3;

	return res;
  }
};

I then have a long std::vector of Col objects which contains duplicates (per my definition of equality), and I wish to "process" each unique element only once.

This is what I do:

std::unordered_map&lt;Col, bool&gt; done_list;

for (Col c : col_list)
{
	if (done_list.find(c) == done_list.end())
	{
        process(c);

		done_list[c] = true;
	}
	else
	{
		continue;
	}
}

Is this a reasonable way to check for whether the same collection of objects has been processed already? Would a sparse matrix be better?

答案1

得分: 6

你的方法是可以的,但可以通过使用 std::unordered_set 来简化一些。

这个集合将包含你已经处理过的值。

#include <unordered_set>

std::unordered_set<Col> done_list;

for (const Col &c : col_list)
{
    if (done_list.find(c) == done_list.end())
    {
        process(c);
        done_list.insert(c);
    }
    else
    {
        continue;
    }
}

请注意,我在范围 for 循环中使用了 const &,以避免对 c 进行复制和修改。如果需要修改它,可以省略 const

如果你使用的是 C++20,你可以使用 std::unordered_set::contains 来使检查更简单:

    if (done_list.contains(c) == false)
    // ...

另一种变化利用了 std::unordered_set::insert 的重载,它返回一个 pair,其中 bool second 表示项目是否已插入(如果它已存在于集合中,它将不会被插入):

    auto[it, inserted] = done_list.insert(c);
    if (inserted)
    {
        process(c);
    }
    else
    {
        continue;
    }
英文:

You approach is OK, but you can simplify it a bit by using std::unordered_set instead of std::unordered_map.

The set will contain the values you already processed.

#include &lt;unordered_set&gt;

std::unordered_set&lt;Col&gt; done_list;

for (Col const &amp; c : col_list)
{
    if (done_list.find(c) == done_list.end())
    {
        process(c);
        done_list.insert(c);
    }
    else
    {
        continue;
    }
}

Note that I used const &amp; in the ranged for-loop to avoid copy and mutation of c. You can drop the const if need to mutate it.

If you are using c++20 you can use std::unordered_set::contains to make the check even simpler:

    if (done_list.contains(c) == false)
    // ...

Another variation takes advantage of the overload of std::unordered_set::insert that returns a pair where the bool second indicates whether the item was inserted (it will not be inserted if it already exists in the set):

    auto[it, inserted] = done_list.insert(c);
    if (inserted)
    {
        process(c);
    }
    else
    {
        continue;
    }

答案2

得分: 3

  • 你需要的是一种只存储每个元素一次的set类型。
  • 你也不需要在set中存储Col。只需使用指向元素的指针就足够了。

我建议使用unordered_set,所以首先我们需要为指针创建一个哈希函数和一个比较函数:

struct colptr_hash {
    const bool operator()(const Col* col) const {
        return std::hash<Col>{}(*col);
    }
};

struct colptr_equal {
    const bool operator()(const Col* lhs, const Col* rhs) const {
        return *lhs == *rhs;
    }
};

然后是实际的用法:

std::unordered_set<Col*, colptr_hash, colptr_equal> done_list;

for (Col& c : col_list) {           // 通过引用获取
    // 尝试插入这个Col*
    auto [it, inserted] = done_list.emplace(&c);

    if (!inserted) continue;        // 已处理,继续下一个

    // 未处理过,继续操作:
    process(c);
}
英文:
  • What you need is rather some kind of set which stores only one of each element.
  • You also do not need to store the Cols in the set. It's enough with pointers to the elements.

I suggest using an unordered_set so first we need a hasher and a comparator for the pointers:

struct colptr_hash {
    const bool operator()(const Col* col) const {
        return std::hash&lt;Col&gt;{}(*col);
    }
};

struct colptr_equal {
    const bool operator()(const Col* lhs, const Col* rhs) const {
        return *lhs == *rhs;
    }
};

Then the actual usage:

std::unordered_set&lt;Col*, colptr_hash, colptr_equal&gt; done_list;

for (Col&amp; c : col_list) {           // take by reference
    // try to insert this Col*
    auto [it, inserted] = done_list.emplace(&amp;c);

    if (!inserted) continue;        // already processed, try next

    // never processed, go ahead:
    process(c);
}

huangapple
  • 本文由 发表于 2023年7月17日 23:40:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76706088.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定