英文:
Compound `HashSet` operations in Rust OR How to get an explicit difference/union of `HashSet`s in Rust
问题
我想用伪代码执行以下操作:
(a, b, c) = (HashSet(...), HashSet(...), HashSet(...))
(a, b, c) = (a - b - c, b - a - c, c - a - b)
在Rust中,我尝试了以下代码:
fn get_random_set(...) -> HashSet<String> {
...
}
// Sets of randomly generated "words" that define behavior of the whole program.
let action_plus: HashSet<String> = get_random_set();
let action_minus: HashSet<String> = get_random_set();
let action_new_line: HashSet<String> = get_random_set();
现在我们要从这些HashSet
中排除所有常见的“单词”。
我了解到difference
和union
方法返回Difference
和Union
迭代器。如果我这样做:
let action_plus = HashSet::from(action_minus.union(&action_new_line).collect::<Vec<String>>());
我会得到这个错误:
the trait `From<Vec<String>>` is not implemented for `HashSet<_, _>`
如何解决这个问题?
英文:
I want perform the following, in pseudocode:
(a, b, c) = (HashSet(...), HashSet(...), HashSet(...))
(a, b, c) = (a - b - c, b - a - c, c - a - b)
In Rust I tried something like this:
fn get_random_set(...) -> HashSet<String> {
...
}
// Sets of randomly generated "words" that define behavior of the whole program.
let action_plus: HashSet<String> = get_random_set();
let action_minus: HashSet<String> = get_random_set();
let action_new_line: HashSet<String> = get_random_set();
Now we are to exclude all the common "words" from these HashSet
s.
I learned that difference
and union
methods return Difference
and Union
iterators. And if I do this:
let action_plus = HashSet::from(action_minus.union(&action_new_line).collect::<Vec<String>>());
I receive this:
the trait `From<Vec<String>>` is not implemented for `HashSet<_, _>`
How to deal with this problem?
答案1
得分: 5
直接使用HashSet
的FromIterator
实现来收集,而不是使用Vec
。具体可以参考impl<T> FromIterator<T> for HashSet<T>。
代码示例:
let action_plus: HashSet<_> = action_minus.union(&action_new_line).collect();
正如Chayim在评论中所说,你也可以使用|
(BitOr
)运算符,因为HashSet
已经实现了它,文档中也有说明:
> 将self和rhs的并集作为新的HashSet<T, S>返回。
如果你查看它的实现,你会发现它基本上与上面的代码相同(尽管它会克隆这些集合中的项):
impl<T, S> BitOr<&HashSet<T, S>> for &HashSet<T, S>
where
T: Eq + Hash + Clone,
S: BuildHasher + Default,
{
type Output = HashSet<T, S>;
fn bitor(self, rhs: &HashSet<T, S>) -> HashSet<T, S> {
self.union(rhs).cloned().collect()
}
}
HashSet
还实现了BitAnd(&
)运算符用于创建交集,Sub(-
)用于创建集合差,以及BitXor(^
)用于创建对称差。
你可以使用这些方便的运算符,但是当人们不期望它们时,可能会感到困惑,所以请谨慎使用。
英文:
Collect directly into HashSet
instead of Vec
using impl<T> FromIterator<T> for HashSet<T>.
That is:
let action_plus: HashSet<_> = action_minus.union(&action_new_line).collect();
As Chayim said in the comments you could also use |
(BitOr
) operator, since HashSet
implements it and as the docs say it:
> Returns the union of self and rhs as a new HashSet<T, S>
If you look at it's implementation you can see, that it does basically the same as the code above (although it clones items from those sets):
impl<T, S> BitOr<&HashSet<T, S>> for &HashSet<T, S>
where
T: Eq + Hash + Clone,
S: BuildHasher + Default,
{
type Output = HashSet<T, S>;
fn bitor(self, rhs: &HashSet<T, S>) -> HashSet<T, S> {
self.union(rhs).cloned().collect()
}
}
HasSet
also implements BitAnd (&
) operator for creating an intersection, Sub (-
) for creating a set difference and BitXor (^
) for creating a symmetric difference.
You can use those convenience operators, however they can be off-putting when one does not expect them, so use them carefully.
答案2
得分: 2
你可以直接使用HashSet
进行集合操作。
let a = get_random_set();
let b = get_random_set();
let c = get_random_set();
let a_minus_b_minus_c = a
.difference(&b)
.cloned()
.collect::<HashSet<_>>()
.difference(&c)
.cloned()
.collect::<HashSet<_>>();
cloned
函数用于将HashSet<&String>
的输出转换为HashSet<String>
,因为difference
要求操作的集合类型相同。如果你可以接受a_minus_b_minus_c
是HashSet<&String>
类型的结果,那么最后一个cloned
函数可以省略。
然而,更简单的方法是使用减法运算符。HashSet
实现了Sub
,可以接受两个操作数的引用,所以每次都需要添加引用。
let a_minus_b_minus_c = &(&a - &b) - &c;
还有其他一些操作:
需要注意的是,所有这些操作都会克隆集合中的值并返回一个新的集合,所以效率不是最优的。如果需要更高的性能,可以通过使用retain
(差集、交集)或extend
(并集)来手动修改第一个集合,或者手动创建一个持有引用的新集合。
英文:
You can collect directly into a HashSet
.
let a = get_random_set();
let b = get_random_set();
let c = get_random_set();
let a_minus_b_minus_c = a
.difference(&b)
.cloned()
.collect::<HashSet<_>>()
.difference(&c)
.cloned()
.collect::<HashSet<_>>();
cloned
is necessary to convert the output from HashSet<&String>
to HashSet<String>
, since difference
requires sets of the same type. You don't need the last cloned
if you're okay with a_minus_b_minus_c
being HashSet<&String>
.
However, it's easier to just use the subtraction operator. HashSet
has an implementation of Sub
that takes both operands by reference, so you need to add a reference each time.
let a_minus_b_minus_c = &(&a - &b) - &c;
There's also:
BitAnd
for intersection&a & &b
BitOr
for union&a | &b
BitXor
for symmetric difference&a ^ &b
Note that all these clone the values in the set and return a new set, so they're not optimally efficient. If you need more performance, you could do the operation manually by modifying the first set with retain
(difference, intersection) or extend
(union), or by manually making a new set that holds references.
答案3
得分: 2
你的目标是从所有集合中删除任何出现在多个集合中的元素。由于你只是删除元素,所以不需要克隆任何字符串,可以直接在原地进行所有操作。如果你的集合分别称为 a
、b
和 c
,你可以使用以下代码:
a.retain(|x| b.remove(x) | c.remove(x));
b.retain(|x| c.remove(x));
这段代码首先从 a
和 b
以及 a
和 c
的交集中删除任何元素,然后从 b
和 c
的交集中删除所有元素。总体而言,这应该比其他答案中讨论的方法要快得多(尽管我没有进行任何基准测试)。
请注意,第一行使用的是非短路的 |
运算符,而不是 ||
。最后一个操作数 c.remove(x)
显然具有副作用,所以我们不能跳过它。否则,出现在所有三个集合中的元素将不会从 c
中被删除。
英文:
Your goal is to remove any element that's in more than one set from all the sets. Since you are only removing elements, you don't need to clone any strings, and you can perform all operations in place. If your sets are called a
, b
and c
, you can use this code:
a.retain(|x| b.remove(x) | c.remove(x));
b.retain(|x| c.remove(x));
This will first remove any element in the intersections between a
and b
as well as a
and c
, respectively, from both sets, and then all elements in the intersection between b
and c
. Overall, this should be much faster than the approaches discussed in the other answers (though I didn't do any benchmarks).
Note that it's important that the first line uses the non-shortcircuiting |
operator rather than ||
. The last operand, c.remove(x)
, obviously has a side effect, so we can't skip it. Otherwise, elements that are in all three sets would not be removed from c
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论