Find missing integer in an array without using sum.

huangapple go评论74阅读模式
英文:

Find missing integer in an array without using sum

问题

在一半的数字被移除后,count(0s) >= count(1s) 仍然成立是因为移除的数字的LSB(最低有效位)在统计中是均匀的。让我继续翻译下文以回答你的问题:

当一半的数字被移除时,我们仍然可以确保 count(0s) >= count(1s) 成立,因为在每一步中,我们只移除了那些与被查找的数字的当前位不同的数字。如果我们正在查找一个偶数(其LSB为0),那么我们只移除了奇数(其LSB为1),反之亦然。这确保了在每一步中,count(0s) 总是大于等于 count(1s),因为我们只移除了与当前位不同的数字,而这一移除是均匀的,不会导致 count(0s) 变得小于 count(1s)

这个算法在每一步都依赖于这个观察结果,以确保问题规模每次都减半,最终达到 O(n) 的时间复杂度。

英文:

I'm working through the hard exercises from the "Cracking the Coding Interview" book. One of the questions is the following:

> 17.4 Missing Number: An array A contains all the integers from 0 to n, except for one number which is missing.
> In this problem, we cannot access an entire integer in A with a single operation. The elements of A are represented in binary, and the only operation we can use to access them is "fetch the jth bit of A[i]", which takes constant time. Write code to find the missing integer. Can you do it in O(n) time?

The trivial answer to this is sum([1, n]) - sum(array). It can even be done bit by bit without using the sum function by simply XOR-ing the numbers. However, the book says:

> The runtime of this solution is n * length(n), where length is the
> number of bits in n. Note that length(n) = log(n). So, the runtime is
> actually O(n log(n)). Not quite good enough!

Although, for all practical purposes, an integer is represented by a fixed number of bits in most programming languages (Java uses 32 bit), so length(n) is a constant, but interview questions are rarely practical, so, we toil on.

I quote the rest of the answer here, then, I'll ask my question.

<!---->

> So how else can we approach it?
We can actually use a similar approach, but leverage the bit values more directly.
>
> Picture a list of binary numbers (the ----- indicates the value that was removed):
>
> 00000 00100 01000 01100
> 00001 00101 01001 01101
> 00010 00110 01010
> ----- 00111 01011
>
> Removing the number above creates an imbalance of 1s and 0s in the least significant bit, which we'll call LSB1. In a list of numbers from 0 to n, we would expect there to be the same number of 0s and 1s (if n is odd), or an additional 0 if n is even. That is:
>
> if n % 2 == 1 then count(0s) = count(1s)
> if n % 2 == 0 then count(0s) = 1 + count(1s)
>
> Note that this means that count(0s) is always greater than or equal to count(1s).
>
> When we remove a value v from the list, we'll know immediately if v is even or odd just by looking at the least significant bits of all the other values in the list.
>
> <!-- +-------------------------+---------------------------------------+---------------------------------------+ -->
> | | n % 2 == 0<br/>count(0s) = 1 + count(1s) | n % 2 == 1<br/>count(0s) = count(1s) |
> |-------------------------|---------------------------------------|---------------------------------------|
> | v % 2 == 0, LSB(v) = 0 | a 0 is removed. <br/>count(0s) = count(1s) | a 0 is removed. <br/>count(0s) < count(1s) |
> | v % 2 == 1, LSB(v) = 1 | a 1 is removed. <br/>count(0s) > count(1s) | a 1 is removed. <br/>count(0s) > count(1s) |
> <!-- +-------------------------+---------------------------------------+---------------------------------------+ -->
>
> So, if count(0s) <= count(1s), then v is even. If count(0s) > count(1s), then v is odd.
> We can now remove all the evens and focus on the odds, or remove all the odds and focus on the evens.
>
> Okay, but how do we figure out what the next bit in v is? If v were contained in our (now smaller) list, then we should expect to find the following:
> count(0s) = count(1s) OR count(0s) = 1 + count(1s)
>
> As in the earlier example, we can deduce the value of the second least significant bit of v.
>
> <!-- +-------------+---------------------------------------+---------------------------------------+ -->
> | | count(0s) = 1 + count(1s) | count(0s) = count(1s) |
> |-------------|---------------------------------------|---------------------------------------|
> | LSB(v) == 0 | a 0 is removed.<br/>count(0s) = count(1s) | a 0 is removed.<br/>count(0s) < count(1s) |
> | LSB(v) == 1 | a 1 is removed.<br/>count(0s) > count(1s) | a 1 is removed.<br/>count(0s) > count(1s) |
> <!-- +-------------+---------------------------------------+---------------------------------------+ -->
>
> Again, we have the same conclusion.
If count(0s) <= count(1s), then LSB2(v) = 0.
If count(0s) > count(1s), then LSB2(v) = 1.
>
> We can repeat this process for each bit. On each iteration, we count the number of 0s and 1s in bit i to check if LSBi(v) is 0 or 1. Then, we discard the numbers where LSBi(x) != LSBi(v). That is, if v is even, we discard the odd numbers, and so on.
>
> By the end of the process, we will have computed all bits in v. In each successive iteration, we look at n, then n/2, then n/4, and so on, bits. This results in a runtime of O(n).
<!-- ---
My question is: -->

I get that count(0s) &gt;= count(1s) is established by observation for numbers from 0 to n. However, once half the numbers are removed, how does count(0s) &gt;= count(1s) still hold true?

The algorithm, obviously, depends on it, and uses it to reduce the problem size in half at each iteration.

答案1

得分: 2

对于理解为什么"num(0s) >= num(1s)"仍然成立,可以看一下这个算法每一步执行的两个操作。

在每一步中,该算法执行两件事:

  1. 删除奇数或偶数(取决于列上的和)
  2. 移除最低有效位

现在让我们看看这对数字意味着什么。让我们写出N个有序的数字:

0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
...

你会注意到最低有效位的序列(最后一列)是0101010101...
让我们进行第一次操作(例如,让我们选择删除奇数),我们得到:

0000
0010
0100
0110
1000
1010
1100
1110
...

现在让我们进行第二次操作:

000
001
010
011
100
101
110
111
...
  • 你会注意到新的最低有效位序列变为:01010101010101...,这正是我们在该步骤开始时的完全相同的序列。它也将满足:num(0s) >= num(1s)

  • 你还可以注意到,如果你选择删除偶数而不是奇数,它将给你完全相同的序列。

  • 你还可以注意到,如果你重复这个操作,它将再次生成这个序列,依此类推(如果你愿意,你可以假设Hn并证明Hn+1来在数学上证明这一点)。

注意:我不确定这个算法实际上是否比平凡的算法更好,甚至在理论上也不确定:在计算机科学中(如果我还记得我的工程课程),复杂性是由图灵机完成算法所需的操作数量定义的。如果n非常大,比如2的100亿次方:那么每个数字包含100亿位。但是图灵机只有一个带子,所有东西都写在上面。因此,图灵机的头需要通过每个数字的每一位进行移动(例如,要从第一个数字移动到第三个数字,因为我们删除了第二个数字,我们需要达到第三个数字,所以我们至少需要通过第二个数字的每一位一次)。我不确定这最后一部分。

英文:

One way to understand why num(0s) >= num(1s) still holds true is to look at the two operations performed by this algorithm at each step.

At each step, this algorithm does 2 things:

  1. delete odd numbers OR delete even numbers (depending on the sum on the column)
  2. remove least significant bit

Now let's look what it means on the numbers. Let's write N numbers ordered:

0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
...

You notice that the sequence of least significant bit (the last column) is 0101010101....
Let's do the first operation (for instance let's choose to delete odd numbers), we get:

0000
0010
0100
0110
1000
1010
1100
1110
...

Now let's do the second operation:

000
001
010
011
100
101
110
111
...
  • You realize that the new sequence of least significant bit becomes: 01010101010101... which is exactly the same sequence we had at the beginning of the step. It will satisfy as well: num(0s) >= num(1s)

  • You can also notice that if you did choose to remove even numbers instead of odds, it would give you the exact same sequence.

  • You can notice that if you repeat this operation, it will be again this sequence and so on (you can demonstrate that mathematically if you want by assuming Hn and then proving Hn+1).

Note: I am not sure this algorithm is actually better than the trivial one, even in theory: In computer science (if i remember well my engineering lessons), the complexity is defined by the number of operation a Turing machine has to do to complete an algorithm. If n was extremely big, let's say 2 power 100 billion: then each number contains 100 billion digits. But a Turing machine contains a single band where everything is written. So the head of the Turing machine needs to move through each digit (for instance to move from first number to third one because we removed the second one, we need to reach the third one, so we need to pass at least once through each digit of second number). I am not sure of this last part.

答案2

得分: 0

你不仅会随机移除选定的一半数字,还会移除区间内的奇数或偶数数字。

  • 因此,剩下的数字可以表示为j=2*i+lsb,其中i遍历数字0..(n/2)(四舍五入)。这导致了i=⌊j/2⌋
  • (用原始数字来比较,原始数字是i,其中i遍历数字0..n)。
  • 通常情况下,j=2^m*i+(lsbs),其中i遍历数字0..(n/2^m)(四舍五入),且0≤lsbs<2^m。这导致了i=⌊j/2^m⌋

因此,相同的推理适用于所有情况。
请注意,只有在从最低位开始时才有效。

英文:

You don't just remove half the numbers chosen at random, but the odd or the even numbers in the interval.

  • Thus the remaining numbers can be given as j=2*i+lsb, where i goes through the numbers 0..(n/2) (rounded up or down). This gives that i=⌊j/2⌋.
  • (For comparison the original numbers were i where i goes through the numbers 0..n).
  • And in general j=2^m*i+(lsbs) where i goes through the numbers 0..(n/2^m) (rounded up or down), and 0&lt;=lsbs&lt;2^m. This gives i=⌊j/2^m⌋.

Thus the same reasoning applies in all of the cases.
Notice that this only works if you start with the lsb.

huangapple
  • 本文由 发表于 2023年7月3日 09:32:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76601395.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定