在Rust中,如何合并`HashSet`操作或者如何获取`HashSet`的显式差异/并集。

huangapple go评论78阅读模式
英文:

Compound `HashSet` operations in Rust OR How to get an explicit difference/union of `HashSet`s in Rust

问题

我想用伪代码执行以下操作:

(a, b, c) = (HashSet(...), HashSet(...), HashSet(...))

(a, b, c) = (a - b - c, b - a - c, c - a - b)

在Rust中,我尝试了以下代码:

fn get_random_set(...) -> HashSet<String> {
    ...
}

// Sets of randomly generated "words" that define behavior of the whole program.
let action_plus: HashSet<String> = get_random_set();
let action_minus: HashSet<String> = get_random_set();
let action_new_line: HashSet<String> = get_random_set();

现在我们要从这些HashSet中排除所有常见的“单词”。
我了解到differenceunion方法返回DifferenceUnion迭代器。如果我这样做:

let action_plus = HashSet::from(action_minus.union(&action_new_line).collect::<Vec<String>>());

我会得到这个错误:

the trait `From<Vec<String>>` is not implemented for `HashSet<_, _>`

如何解决这个问题?

英文:

I want perform the following, in pseudocode:

(a, b, c) = (HashSet(...), HashSet(...), HashSet(...))

(a, b, c) = (a - b - c, b - a - c, c - a - b)

In Rust I tried something like this:

fn get_random_set(...) -> HashSet<String> {
    ...
}

// Sets of randomly generated "words" that define behavior of the whole program.
let action_plus: HashSet<String> = get_random_set();
let action_minus: HashSet<String> = get_random_set();
let action_new_line: HashSet<String> = get_random_set();

Now we are to exclude all the common "words" from these HashSets.
I learned that difference and union methods return Difference and Union iterators. And if I do this:

let action_plus = HashSet::from(action_minus.union(&action_new_line).collect::<Vec<String>>());

I receive this:

the trait `From<Vec<String>>` is not implemented for `HashSet<_, _>`

How to deal with this problem?

答案1

得分: 5

直接使用HashSetFromIterator实现来收集,而不是使用Vec。具体可以参考impl<T> FromIterator<T> for HashSet<T>

代码示例:

let action_plus: HashSet&lt;_&gt; = action_minus.union(&amp;action_new_line).collect();

正如Chayim在评论中所说,你也可以使用|BitOr)运算符,因为HashSet已经实现了它,文档中也有说明:

> 将self和rhs的并集作为新的HashSet<T, S>返回。

如果你查看它的实现,你会发现它基本上与上面的代码相同(尽管它会克隆这些集合中的项):

impl&lt;T, S&gt; BitOr&lt;&amp;HashSet&lt;T, S&gt;&gt; for &amp;HashSet&lt;T, S&gt;
where
    T: Eq + Hash + Clone,
    S: BuildHasher + Default,
{
    type Output = HashSet&lt;T, S&gt;;

    fn bitor(self, rhs: &amp;HashSet&lt;T, S&gt;) -&gt; HashSet&lt;T, S&gt; {
        self.union(rhs).cloned().collect()
    }
}

HashSet还实现了BitAnd&amp;)运算符用于创建交集,Sub-)用于创建集合差,以及BitXor^)用于创建对称差。

你可以使用这些方便的运算符,但是当人们不期望它们时,可能会感到困惑,所以请谨慎使用。

英文:

Collect directly into HashSet instead of Vec using impl<T> FromIterator<T> for HashSet<T>.

That is:

let action_plus: HashSet&lt;_&gt; = action_minus.union(&amp;action_new_line).collect();

As Chayim said in the comments you could also use | (BitOr) operator, since HashSet implements it and as the docs say it:

> Returns the union of self and rhs as a new HashSet<T, S>

If you look at it's implementation you can see, that it does basically the same as the code above (although it clones items from those sets):

impl&lt;T, S&gt; BitOr&lt;&amp;HashSet&lt;T, S&gt;&gt; for &amp;HashSet&lt;T, S&gt;
where
    T: Eq + Hash + Clone,
    S: BuildHasher + Default,
{
    type Output = HashSet&lt;T, S&gt;;

    fn bitor(self, rhs: &amp;HashSet&lt;T, S&gt;) -&gt; HashSet&lt;T, S&gt; {
        self.union(rhs).cloned().collect()
    }
}

HasSet also implements BitAnd (&amp;) operator for creating an intersection, Sub (-) for creating a set difference and BitXor (^) for creating a symmetric difference.

You can use those convenience operators, however they can be off-putting when one does not expect them, so use them carefully.

答案2

得分: 2

你可以直接使用HashSet进行集合操作。

let a = get_random_set();
let b = get_random_set();
let c = get_random_set();
let a_minus_b_minus_c = a
    .difference(&b)
    .cloned()
    .collect::<HashSet<_>>()
    .difference(&c)
    .cloned()
    .collect::<HashSet<_>>();

cloned函数用于将HashSet<&String>的输出转换为HashSet<String>,因为difference要求操作的集合类型相同。如果你可以接受a_minus_b_minus_cHashSet<&String>类型的结果,那么最后一个cloned函数可以省略。

然而,更简单的方法是使用减法运算符。HashSet实现了Sub,可以接受两个操作数的引用,所以每次都需要添加引用。

let a_minus_b_minus_c = &(&a - &b) - &c;

还有其他一些操作:

  • BitAnd 用于求交集 &a & &b
  • BitOr 用于求并集 &a | &b
  • BitXor 用于求对称差集 &a ^ &b

需要注意的是,所有这些操作都会克隆集合中的值并返回一个新的集合,所以效率不是最优的。如果需要更高的性能,可以通过使用retain(差集、交集)或extend(并集)来手动修改第一个集合,或者手动创建一个持有引用的新集合。

英文:

You can collect directly into a HashSet.

let a = get_random_set();
let b = get_random_set();
let c = get_random_set();
let a_minus_b_minus_c = a
    .difference(&amp;b)
    .cloned()
    .collect::&lt;HashSet&lt;_&gt;&gt;()
    .difference(&amp;c)
    .cloned()
    .collect::&lt;HashSet&lt;_&gt;&gt;();

cloned is necessary to convert the output from HashSet&lt;&amp;String&gt; to HashSet&lt;String&gt;, since difference requires sets of the same type. You don't need the last cloned if you're okay with a_minus_b_minus_c being HashSet&lt;&amp;String&gt;.

However, it's easier to just use the subtraction operator. HashSet has an implementation of Sub that takes both operands by reference, so you need to add a reference each time.

let a_minus_b_minus_c = &amp;(&amp;a - &amp;b) - &amp;c;

There's also:

  • BitAnd for intersection &amp;a &amp; &amp;b
  • BitOr for union &amp;a | &amp;b
  • BitXor for symmetric difference &amp;a ^ &amp;b

Note that all these clone the values in the set and return a new set, so they're not optimally efficient. If you need more performance, you could do the operation manually by modifying the first set with retain (difference, intersection) or extend (union), or by manually making a new set that holds references.

答案3

得分: 2

你的目标是从所有集合中删除任何出现在多个集合中的元素。由于你只是删除元素,所以不需要克隆任何字符串,可以直接在原地进行所有操作。如果你的集合分别称为 abc,你可以使用以下代码:

a.retain(|x| b.remove(x) | c.remove(x));
b.retain(|x| c.remove(x));

这段代码首先从 ab 以及 ac 的交集中删除任何元素,然后从 bc 的交集中删除所有元素。总体而言,这应该比其他答案中讨论的方法要快得多(尽管我没有进行任何基准测试)。

请注意,第一行使用的是非短路的 | 运算符,而不是 ||。最后一个操作数 c.remove(x) 显然具有副作用,所以我们不能跳过它。否则,出现在所有三个集合中的元素将不会从 c 中被删除。

英文:

Your goal is to remove any element that's in more than one set from all the sets. Since you are only removing elements, you don't need to clone any strings, and you can perform all operations in place. If your sets are called a, b and c, you can use this code:

a.retain(|x| b.remove(x) | c.remove(x));
b.retain(|x| c.remove(x));

This will first remove any element in the intersections between a and b as well as a and c, respectively, from both sets, and then all elements in the intersection between b and c. Overall, this should be much faster than the approaches discussed in the other answers (though I didn't do any benchmarks).

Note that it's important that the first line uses the non-shortcircuiting | operator rather than ||. The last operand, c.remove(x), obviously has a side effect, so we can't skip it. Otherwise, elements that are in all three sets would not be removed from c.

huangapple
  • 本文由 发表于 2023年8月8日 22:05:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76860337.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定