这个算法如何生成一个集合的所有子集?

huangapple go评论64阅读模式
英文:

How does this algorithm generate all subsets of a set

问题

I understand your question. The code is written in C and appears to be using bitwise operations to efficiently generate subset bitvector representations. In this code, S represents the input set, and S1 is used to iterate through all non-empty subsets of S. The key to understanding it is how the bitwise operations work.

The expression S1 = S & -S sets S1 to the lowest bit set in S. Then, in the loop, S1 is updated to the next subset by performing S1 = S & (S1 - S). This operation essentially clears the lowest bit in S1, effectively moving to the next subset.

The loop continues until S1 equals S again. At this point, all non-empty subsets have been iterated through, and the loop terminates.

So, the code efficiently generates all non-empty subsets of S for further processing. The subtraction of S from S1 is what allows the loop to iterate through these subsets, and it doesn't result in an empty set in this context.

英文:

I am currently taking a course in query optimization and I have an algorithm that calculates the costs of joining all subsets of a given set of relations. To do that, we are given a subroutine enumerating all these subsets. In our script, it says: "[...] code fragment with which subset bitvector representations can be generated very efficiently. In C, this fragment looks as follows:

S1 = S & - S;
do { /* do something with subset S1 */
S1 = S & (S1 - S);
} while (S1 != S);

S represents the input set. S1 iterates through all subsets of S where S itself and the empty set are not considered.

However, I do not understand it: When we take the set difference of S and S, we get the empty set. So the while loop will always output the empty set as S1 and never terminate. What am I missing?

答案1

得分: 7

当我们对S和S的差集进行操作时,我们得到空集。因此,while循环将始终将空集输出为S1,永远不会终止。我错过了什么?

SS1 是作为位向量使用的整数,这意味着每个位对应于可能的集合元素之一,子集中元素的存在或不存在对应于其位的值是1或0。差集运算符 - 不是集合的差集,而是算术的差集。

代码假设有符号整数的二进制补码表示法。在这个约定中,任何操作数的算术相反数的表示方法可以通过翻转其表示的所有位并将结果加1来形成(将其视为无符号操作数并忽略溢出)。从中可以得出这样的属性:S & -S 计算得到的整数的所有位都为零,除了S 中设置的最低位(如果有的话)

例如:假设 S 是一个值为22的32位整数。S 的二进制表示为

00000000000000000000000000010110

翻转所有位得到

11111111111111111111111111101001

加1得到

11111111111111111111111111101010

,这是-22的表示。计算与 S 的按位 & 运算会关闭除了 S 中设置的位之外的所有位,得到

00000000000000000000000000000010

。正如广告宣传的那样,这个结果除了在 S 中的最低位处的位之外,所有位都为零。

S1 = S & (S1 - S) 可能更容易理解,如果改为重新表述为 S1 = S & (S1 + (-S)),或者更好地表述为 S1 = S & (S1 + (~S) + 1)。鉴于在 S1 中设置的所有位也在 S 中设置,添加 ~S 只会打开所有在 S 中未设置的位。此外,将1添加到其中将翻转从 S1 + (~S) 开始到第一个在 S1 + (~S) 中未设置的位(因为我们刚刚打开了所有未设置的位)的最低位。然后按位 & 运算会关闭在 S 中未设置的所有位。总体结果就好像我们使用仅在 S 中设置的位组成的计数器逐个递增。

例如:继续以上的示例,S1 + ~S

00000000000000000000000000000010 + 11111111111111111111111111101001
==
11111111111111111111111111101011

添加1得到

11111111111111111111111111101100

并且关闭 S 中未设置的位后得到

00000000000000000000000000000100

您可以自行验证,接下来的迭代依次产生

00000000000000000000000000000110
00000000000000000000000000010000
00000000000000000000000000010010
00000000000000000000000000010100
00000000000000000000000000010110

,最后一个等于 S

值得注意的是,原始的 S1 = S & -S 实际上是这个通用情况的特例,就好像 S1 最初是0一样:S1 = S & (0 - S)

另请注意,原始代码中存在一个错误。它应该避免考虑完整集合 S 或空集,在处理表示仅包含一个元素的集合 S 时,在停止之前它将同时考虑这两种情况。为了避免这种情况,需要在第一次迭代之前测试终止条件,可以通过使用 while 循环而不是 do .. while 来实现:

S1 = S & -S;
while (S1 != S) {
    /* 对子集 S1 做些什么 */
    S1 = S & (S1 - S);
}
英文:

> When we take the set difference of S and S, we get the empty set. So the while loop will always output the empty set as S1 and never terminate. What am I missing?

S and S1 are integers used as bitvectors, meaning that each bit corresponds to one of the possible set elements, with the presence or absence of that element in the subset corresponding to whether its bit takes the value 1 or 0. The difference operator - is not a set difference, it is an arithmetic difference.

The code assumes two's complement representation for signed integers. In this convention, the representation of the arithmetic inverse of any operand can be formed by flipping all the bits of its representation and adding 1 to the result (treating it as an unsigned operand and ignoring overflow). From this emerges the property that S & - S evaluates to an integer in which all bits are zero except the least-significant bit (if any) that is set in S.

Example: Suppose S is a 32-bit integer with value 22. The binary representation of S is

00000000000000000000000000010110

Flipping all the bits produces

11111111111111111111111111101001

Adding 1 produces

11111111111111111111111111101010

, which is the representation of -22. Computing the bitwise & with S turns off all bits that are not set in S, yielding

00000000000000000000000000000010

. As advertised, this has all bits zero except the bit at the position of the least-significant 1 bit in S.

The S1 = S & (S1 - S) would perhaps be more easily understood if reformulated as S1 = S & (S1 + (-S)), or even better as S1 = S & (S1 + (~S) + 1). Given that all the bits that are set in S1 are also set in S, adding ~S just turns on all the bits that are not set in S, too. Furthermore adding 1 to that will flip all the least-significant bits up to and including the first that is not set in S1 + (~S), which is necessarily one that is set in S (because we just turned on all the ones that weren't). Then the bitwise & turns back off all the bits that are not set in S. The overall result is as if we count upward by one, using a counter composed of only those bits that are set in S.

Example: continuing from the above, S1 + ~S is

00000000000000000000000000000010 + 11111111111111111111111111101001
==
11111111111111111111111111101011

Adding 1 yields

11111111111111111111111111101100

And masking off the bits not set in S yields

00000000000000000000000000000100

You can verify for yourself that the next iterations produce, in succession

00000000000000000000000000000110
00000000000000000000000000010000
00000000000000000000000000010010
00000000000000000000000000010100
00000000000000000000000000010110

, where the last of those is equal to S.

It may also be interesting to note that the original S1 = S & -S is just a special case of this, as if S1 were initially 0: S1 = S & (0 - S).


Note well that the formulation you asked about serves the general case, where the initial S is an arbitrary subset of all the possible elements. If we could assume that S would be a bitvector representing all possible elements, using its rightmost bits, then we could use this alternative, which is much easier to follow:

for (int S1 = 1; S1 < S; S1++) {
    /* do something with subset S1 */
}

Note also that there is a flaw in the original code presented. It is supposed to avoid considering the full set S or the empty set, but it will consider both before it stops if S represents a set containing exactly one element. To avoid that, it needs to test the termination condition before the first iteration, maybe just by using a while loop instead of a do .. while:

S1 = S & - S;
while (S1 != S) {
    /* do something with subset S1 */
    S1 = S & (S1 - S);
}

huangapple
  • 本文由 发表于 2023年5月25日 03:13:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76326734.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定