[start/2 + mid/2] 和 [(start + mid)/2] 在二分查找中有什么区别?

huangapple go评论82阅读模式
英文:

What is the difference between [start/2 + mid/2] and [(start + mid)/2] in binary search?

问题

在二分查找算法中,我们将中间值设置为:

mid = (start + end)/2,与

mid = start/2 + end/2 相同,也等于

mid = start + (end - start)/2

但是这三者在计算过程中产生不同的结果,尽管它们是相同的算术表达式。
在计算过程中它们如何变化?

这是用二分查找来查找向量数组中元素的最后出现的代码:

int lastOccurrence(vector<int> arr, int size, int key){
    int start = 0, end = size - 1;
    int last = -1;
    // int mid = start/2 + end/2;
    int mid;
    while(start <= end){
        // mid = start + (end - start)/2;
        mid = (start + end)/2;
        if(key == arr[mid]){
            last = mid;
            start = mid + 1;
        }
        else if(key > arr[mid]){
            start = mid + 1;
        }
        else{
            end = mid - 1;
        }
        cout << "start: " << start << "\tend: " << end << "\tmid: " << mid << endl;
    }
    return last;
}

传递给函数的值为:

int main(){
    vector<int> arr = {1,2,3,4,4,4,4,5,6,7,11};
    int size = arr.size();
    int key = 4;
    cout << "First occurrence of " << key << " is at index " << firstOccurrence(arr, size, key) << endl;

    cout << "Last occurrence of " << key << " is at index " << lastOccurrence(arr, size, key) << endl;

    return 0;
}

如果中间元素等于所需的 "key" 元素,那么中间索引将存储在一个变量中,而 start 将更新为 mid + 1,以便在数组的右部分搜索 "key" 的任何其他出现。如果发现 "key" 小于中间元素,则意味着该元素不存在于中间元素之后,因此将 end 更新为 mid - 1,以在数组的左部分搜索,如果发现 "key" 大于中间元素,类似地在右部分搜索。

当使用 mid = start/2 + end/2 和 mid = (start + end)/2 时,它们在计算过程中产生不同的结果。这在计算过程中是如何影响的?

英文:

In the binary search algorithm, we set the mid as:

mid = (start + end)/2 which is same as

mid = start/2 + end/2 and also equal to

mid = start + (end - start)/2

but all of the three give different results being the same arithmetic expression.
How do these vary during the computation process?

This was the code to find the last occurrence of an element in a vector array
using binary search:

int lastOccurrence(vector&lt;int&gt; arr, int size, int key){
    int start = 0, end = size - 1;
    int last = -1;
    // int mid = start/2 + end/2;
    int mid;
    while(start &lt;= end){
        // mid = start + (end - start)/2;
        mid = (start + end)/2;
        if(key == arr[mid]){
            last = mid;
            start = mid + 1;
        }
        else if(key &gt; arr[mid]){
            start = mid + 1;
        }
        else{
            end = mid - 1;
        }
        cout &lt;&lt; &quot;start: &quot; &lt;&lt; start &lt;&lt; &quot;\tend: &quot; &lt;&lt; end &lt;&lt; &quot;\tmid: &quot; &lt;&lt; mid &lt;&lt; endl;
    }
    return last;
}

The values being passed to the function are:

int main(){
    vector&lt;int&gt; arr = {1,2,3,4,4,4,4,5,6,7,11};
    int size = arr.size();
    int key = 4;
    cout &lt;&lt; &quot;First occurrence of &quot; &lt;&lt; key &lt;&lt; &quot; is at index &quot; &lt;&lt; firstOccurrence(arr, size, key) &lt;&lt; endl;

    cout &lt;&lt; &quot;Last occurrence of &quot; &lt;&lt; key &lt;&lt; &quot; is at index &quot; &lt;&lt; lastOccurrence(arr, size, key) &lt;&lt; endl;

    return 0;
}

If the mid element equals the required "key" element then the mid index is stored in a variable and start is updated to mid + 1 so that it can search the right part of the array for any other occurrence of the "key". If the "key" is found to be less than the mid element it implies that the element is not present beyond the mid element and the end is updated to mid - 1 to search in the left part of the array, and similarly to search the right part if "key" is found to be greater than the mid element.

It gave different results when
mid = start/2 + end/2 was used and mid = (start + end)/2 was used.
How is this affected during the computation process?

答案1

得分: 6

对于开始,函数是无效的。当 size 等于 1 时,由于这个语句

int start = 0, end = size - 1;

导致 end 等于 0

在这种情况下,while 循环

while(start &lt; end){

将被跳过。并且函数将返回 last 的值等于 -1

int last = -1;

// ...

return last;

尽管 arr[0] 可能等于 key

至于你的问题,当 startend 都是奇数值时,表达式

start/2 + end/2

的值会比其他两个表达式少一个。

至于这个表达式 (start + end)/2,由于可能会发生 start + end 的和溢出,因此是不安全的。

请注意,在 C++20 中,头文件 <numeric> 中声明了函数 std::midpoint,应该使用它来代替手动编写的表达式。

至于整个函数,已经有了标准算法 std::upper_bound,它在头文件 <algorithm> 中声明,可以被调整以代替这个函数。

英文:

For starters the function is invalid. When size is equal to 1 then you have due to this statement

int start = 0, end = size - 1;

that end is equal to 0.

In this case the while loop

while(start &lt; end){

will be skipped. And the function will return the value of last equal to -1

int last = -1;

// ...

return last;

though arr[0] can be equal to key.

As for your question then when start and end are both odd values then the value of mid will be one less for the expression

start/2 + end/2

then for other two expressions.

As for this expression (start + end)/2 then it is unsafe due to a possible overflow of the sum start + end.

Pay attention to that in C++20 there is function std::midpoint declared in header &lt;numeric&gt; that can be and should be used instead of manually written expressions.

As for the function as whole then there is already standard algorithm std::upper_bound declared in header &lt;algorithm&gt; that can be adapted for using instead of the function.

答案2

得分: 4

只有第二个表达式不依赖于startend是奇数还是偶数。我并没有真正计算(通过一个2x2表格)a + (b-a)/2是否产生与(a+b)/2相同的结果。然而,当涉及整数运算时,最好不要依赖直觉,因为很容易偏差一个(或更多)。此外,我还没有考虑整数溢出。当start+end溢出时,(start/2) + (end/2)不会溢出。

英文:

You need to consider that integer arithmetic cuts off any fractional parts, hence depending on the last bit of start and stop you get different results.

Suppose they are

 start = M*2 + a;
 end = N*2 + b;

Where M and N are integers and a and b are either 1 or 0, then you get

mid_0 = (start + end)/2 = M+N + (a+b) / 2
mid_1 = start/2 + end/2 = M+N
mid = start + (end - start)/2 = M*2 + a + (N-M) + (b-a)/2 = M+N + a + (b-a)/2 

Only the second expression does not depend on whether start or end are even or odd. I didn't actually bother to work out (via a 2x2 table) whether a + (b-a)/2 yields the same result as (a+b)/2. However, when dealing with integer arithmetic you better do not rely on intuition, because it's too easy to be off by one (or more). Moreover, I did not yet consider integer overflow. When start+end overflows then (start/2) + (end/2) does not.

答案3

得分: 3

以下是要翻译的内容:

所有这些都是为了在计算中点时防止溢出的尝试:

(a + b) / 2

最佳方式(据说)是这样的:

a + b = (a ^ b) + (a & b) << 1
(a + b) / 2 = (a ^ b) / 2 + (a & b)

这个等式来自于唐纳德·克努斯(Don Knuth)的书《计算机程序设计艺术,第4卷》。

这是因为唐纳德的公式据说是牢不可破的。它应该在负指数和正指数上同样有效。请注意,右移(包括算术右移)并不总是等同于除以2。

英文:

All of these are attempts to prevent overflow when calculating the midpoint:

(a + b) / 2

The best way (supposedly) is this:

a + b = (a ^ b) + (a &amp; b) &lt;&lt; 1
(a + b) / 2 = (a ^ b) / 2 + (a &amp; b)

The identity comes from Don Knuth's book, The Art of Computer Programming, Vol. 4.

This is because Don's formula is supposedly bulletproof. It should work equally well on negative and positive indices. Note that shifting right (including arithmetically) is not always the same as dividing by 2.

答案4

得分: 2

如前述答案中提到的,(start + end)/2 更好地处理残差。然而,start/2 + end/2 不容易发生溢出,而(start + end)/2 可能会发生溢出。

因此,如果您要处理可能有超过2G元素的数组,建议将start/end/mid设为64位整数,或者偏向使用start/2 + end/2形式。

英文:

As mentioned in the previous answer, (start + end)/2 handles residuals better.
However, start/2 + end/2 is not susceptible to overflow while (start + end)/2 is.

So if you're handling arrays with potentially >2G elements, it is advised to either make start/end/mid 64b ints or prefer the start/2 + end/2 form.

huangapple
  • 本文由 发表于 2023年7月10日 19:25:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76653259.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定