英文:
What is the difference between [start/2 + mid/2] and [(start + mid)/2] in binary search?
问题
在二分查找算法中,我们将中间值设置为:
mid = (start + end)/2,与
mid = start/2 + end/2 相同,也等于
mid = start + (end - start)/2
但是这三者在计算过程中产生不同的结果,尽管它们是相同的算术表达式。
在计算过程中它们如何变化?
这是用二分查找来查找向量数组中元素的最后出现的代码:
int lastOccurrence(vector<int> arr, int size, int key){
int start = 0, end = size - 1;
int last = -1;
// int mid = start/2 + end/2;
int mid;
while(start <= end){
// mid = start + (end - start)/2;
mid = (start + end)/2;
if(key == arr[mid]){
last = mid;
start = mid + 1;
}
else if(key > arr[mid]){
start = mid + 1;
}
else{
end = mid - 1;
}
cout << "start: " << start << "\tend: " << end << "\tmid: " << mid << endl;
}
return last;
}
传递给函数的值为:
int main(){
vector<int> arr = {1,2,3,4,4,4,4,5,6,7,11};
int size = arr.size();
int key = 4;
cout << "First occurrence of " << key << " is at index " << firstOccurrence(arr, size, key) << endl;
cout << "Last occurrence of " << key << " is at index " << lastOccurrence(arr, size, key) << endl;
return 0;
}
如果中间元素等于所需的 "key" 元素,那么中间索引将存储在一个变量中,而 start 将更新为 mid + 1,以便在数组的右部分搜索 "key" 的任何其他出现。如果发现 "key" 小于中间元素,则意味着该元素不存在于中间元素之后,因此将 end 更新为 mid - 1,以在数组的左部分搜索,如果发现 "key" 大于中间元素,类似地在右部分搜索。
当使用 mid = start/2 + end/2 和 mid = (start + end)/2 时,它们在计算过程中产生不同的结果。这在计算过程中是如何影响的?
英文:
In the binary search algorithm, we set the mid as:
mid = (start + end)/2 which is same as
mid = start/2 + end/2 and also equal to
mid = start + (end - start)/2
but all of the three give different results being the same arithmetic expression.
How do these vary during the computation process?
This was the code to find the last occurrence of an element in a vector array
using binary search:
int lastOccurrence(vector<int> arr, int size, int key){
int start = 0, end = size - 1;
int last = -1;
// int mid = start/2 + end/2;
int mid;
while(start <= end){
// mid = start + (end - start)/2;
mid = (start + end)/2;
if(key == arr[mid]){
last = mid;
start = mid + 1;
}
else if(key > arr[mid]){
start = mid + 1;
}
else{
end = mid - 1;
}
cout << "start: " << start << "\tend: " << end << "\tmid: " << mid << endl;
}
return last;
}
The values being passed to the function are:
int main(){
vector<int> arr = {1,2,3,4,4,4,4,5,6,7,11};
int size = arr.size();
int key = 4;
cout << "First occurrence of " << key << " is at index " << firstOccurrence(arr, size, key) << endl;
cout << "Last occurrence of " << key << " is at index " << lastOccurrence(arr, size, key) << endl;
return 0;
}
If the mid element equals the required "key" element then the mid index is stored in a variable and start is updated to mid + 1 so that it can search the right part of the array for any other occurrence of the "key". If the "key" is found to be less than the mid element it implies that the element is not present beyond the mid element and the end is updated to mid - 1 to search in the left part of the array, and similarly to search the right part if "key" is found to be greater than the mid element.
It gave different results when
mid = start/2 + end/2 was used and mid = (start + end)/2 was used.
How is this affected during the computation process?
答案1
得分: 6
对于开始,函数是无效的。当 size
等于 1
时,由于这个语句
int start = 0, end = size - 1;
导致 end
等于 0
。
在这种情况下,while 循环
while(start < end){
将被跳过。并且函数将返回 last
的值等于 -1
int last = -1;
// ...
return last;
尽管 arr[0]
可能等于 key
。
至于你的问题,当 start
和 end
都是奇数值时,表达式
start/2 + end/2
的值会比其他两个表达式少一个。
至于这个表达式 (start + end)/2
,由于可能会发生 start + end
的和溢出,因此是不安全的。
请注意,在 C++20 中,头文件 <numeric>
中声明了函数 std::midpoint
,应该使用它来代替手动编写的表达式。
至于整个函数,已经有了标准算法 std::upper_bound
,它在头文件 <algorithm>
中声明,可以被调整以代替这个函数。
英文:
For starters the function is invalid. When size
is equal to 1
then you have due to this statement
int start = 0, end = size - 1;
that end
is equal to 0
.
In this case the while loop
while(start < end){
will be skipped. And the function will return the value of last
equal to -1
int last = -1;
// ...
return last;
though arr[0]
can be equal to key
.
As for your question then when start
and end
are both odd values then the value of mid
will be one less for the expression
start/2 + end/2
then for other two expressions.
As for this expression (start + end)/2
then it is unsafe due to a possible overflow of the sum start + end
.
Pay attention to that in C++20 there is function std::midpoint
declared in header <numeric>
that can be and should be used instead of manually written expressions.
As for the function as whole then there is already standard algorithm std::upper_bound
declared in header <algorithm>
that can be adapted for using instead of the function.
答案2
得分: 4
只有第二个表达式不依赖于start
或end
是奇数还是偶数。我并没有真正计算(通过一个2x2表格)a + (b-a)/2
是否产生与(a+b)/2
相同的结果。然而,当涉及整数运算时,最好不要依赖直觉,因为很容易偏差一个(或更多)。此外,我还没有考虑整数溢出。当start+end
溢出时,(start/2) + (end/2)
不会溢出。
英文:
You need to consider that integer arithmetic cuts off any fractional parts, hence depending on the last bit of start
and stop
you get different results.
Suppose they are
start = M*2 + a;
end = N*2 + b;
Where M
and N
are integers and a
and b
are either 1
or 0
, then you get
mid_0 = (start + end)/2 = M+N + (a+b) / 2
mid_1 = start/2 + end/2 = M+N
mid = start + (end - start)/2 = M*2 + a + (N-M) + (b-a)/2 = M+N + a + (b-a)/2
Only the second expression does not depend on whether start
or end
are even or odd. I didn't actually bother to work out (via a 2x2 table) whether a + (b-a)/2
yields the same result as (a+b)/2
. However, when dealing with integer arithmetic you better do not rely on intuition, because it's too easy to be off by one (or more). Moreover, I did not yet consider integer overflow. When start+end
overflows then (start/2) + (end/2)
does not.
答案3
得分: 3
以下是要翻译的内容:
所有这些都是为了在计算中点时防止溢出的尝试:
(a + b) / 2
最佳方式(据说)是这样的:
a + b = (a ^ b) + (a & b) << 1
(a + b) / 2 = (a ^ b) / 2 + (a & b)
这个等式来自于唐纳德·克努斯(Don Knuth)的书《计算机程序设计艺术,第4卷》。
这是因为唐纳德的公式据说是牢不可破的。它应该在负指数和正指数上同样有效。请注意,右移(包括算术右移)并不总是等同于除以2。
英文:
All of these are attempts to prevent overflow when calculating the midpoint:
(a + b) / 2
The best way (supposedly) is this:
a + b = (a ^ b) + (a & b) << 1
(a + b) / 2 = (a ^ b) / 2 + (a & b)
The identity comes from Don Knuth's book, The Art of Computer Programming, Vol. 4.
This is because Don's formula is supposedly bulletproof. It should work equally well on negative and positive indices. Note that shifting right (including arithmetically) is not always the same as dividing by 2.
答案4
得分: 2
如前述答案中提到的,(start + end)/2
更好地处理残差。然而,start/2 + end/2
不容易发生溢出,而(start + end)/2
可能会发生溢出。
因此,如果您要处理可能有超过2G元素的数组,建议将start
/end
/mid
设为64位整数,或者偏向使用start/2 + end/2
形式。
英文:
As mentioned in the previous answer, (start + end)/2
handles residuals better.
However, start/2 + end/2
is not susceptible to overflow while (start + end)/2
is.
So if you're handling arrays with potentially >2G elements, it is advised to either make start
/end
/mid
64b ints or prefer the start/2 + end/2
form.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论