为什么我的二分查找算法错过了最优解,如何改进它?

huangapple go评论57阅读模式
英文:

Why does my binary search algorithm miss the optimal solution, and how can it be improved?

问题

我主要寻找的是理论上的答案,最好是解释我的二分搜索算法误解(下文有详述)。欢迎提供实际示例。

我有一个复杂的函数,我想在该函数上执行二分搜索,以找到n的值,使得computed_goal与给定的goal匹配,其中goal是我想要匹配的先前给定的值。以下算法说明了这一点。

l = 0, h = 4000; // 二分搜索的下界和上界
while (l < h)
{
    n = (l + h) / 2;

    x = vars / n; // vars代表更复杂的计算,x依赖于n

    computed_goal = ax^4 + bx^3 + cx^2 + dx + c; // computed_goal依赖于n

    if (computed_goal < goal)
        l = n + 1;
    else
        h = n - 1;

    if (abs(computed_goal - goal) / goal < 0.01) // 检查相对误差是否在1%以内
        return n; // 成功匹配到目标值!
    else
        return 0; // 迭代未能达到所需的精度
}

现在,如果您熟悉二分搜索算法,您可能已经看到了问题,但我对它们不是特别熟悉。我的问题并不总是出现,但有时如下所示:

假设我的函数为这些n提供了以下值:

n = 2998: computed_goal = 967.8732568 (-2.1267432 = computed_goal - goal)
n = 2999: computed_goal = 968.6512540 (-1.3487460)
n = 3000: computed_goal = 969.4294677 (-0.5705323)
n = 3001: computed_goal = 970.2078977 (+0.2078977) <= 这是最佳解决方案
n = 3002: computed_goal = 970.9865442 (+0.9865442)

现在,假设l = 2987h = 3015。这使得n = 3001。我的goal是匹配值970。请注意,n = 3001是最佳解决方案,因为|computed_goal - goal|的大小最小,最接近goal

然而,我的算法看到它并且执行了以下操作:computed_goal = 970.2078977 < goal = 970 = False,并将新的h设置为n - 1 = 3000,从而将最佳解决方案从将来的迭代中排除。恰好我的算法最终错误地定位到n = 2999,距离最佳解决方案有两步,以下是参考清单:

l h n computed_goal goal
2987 3015 = 3001, 970.207897725643 < 970
2987 3000 = 2993, 963.9865161108 < 970
2994 3000 = 2997, 967.095475944351 < 970
2998 3000 = 2999, 968.651254012028 < 970

误解

我理解我正在检查但随后排除了最佳解决方案,因为我没有回溯到它,所以它被丢失并被排除在外...

我以前被告知,你应该(总是)排除已经检查过的端点,因为你已经检查过它们,因此在将来的搜索范围内包含它们没有意义。所以像在这种情况下,我已经检查了3001,以后的迭代中我排除了这个值。我猜这是不正确的,并且我猜想我必须包括它们(?)您可以帮助我解释这个理论部分吗?

尝试修复

我还应该注意,当我将我的if语句更改为包括端点时,如下所示:

if (computed_goal < goal)
l = n;
else
h = n;

算法会无限期地执行无限循环,l = 3000h = 3001。我不知道该如何解决这个问题。我可以将我的while循环更改为while(l + 1 < h),这将打破无限循环,但我觉得它看起来不对,并且我担心我可能会引入代码中的不同错误,我无法确定是否应该使用l + 1 < h的循环条件。

函数概述

为什么我的二分查找算法错过了最优解,如何改进它?

代码

function matchGoal(int $goal = 970): int
{
    $l = 0;
    $h = 3900;

    while ($l < $h)
    {
        $n = intdiv($h + $l, 2);
        $x = 8321100 / $n;
        $factor = 5.113233695 * pow($n / 3000, 2);

        $computed_goal = (((-8.16965E-10 * $x + -3.15457E-06) * $x + 0.008437163) * $x + 207.9079475) * $factor - 0.076934117;

        if ($computed_goal < $goal)
            $l = $n + 1;
        else
            $h = $n - 1;
    }

    $relative_error = 100 * abs($computed_goal - $goal) / $goal;
    return ($relative_error < 1) ? $n : 0;
}

// 输出2999,而最佳解是3001
print matchGoal() . PHP_EOL; 
英文:

I am looking primarily for a theoretical answer, ideally one addressing my misunderstanding (below) about binary search algorithms. Practical examples are welcome as well.

I have a complex function where I want to do a binary search on that function to find the value of n, so that computed_goal matches a given goal, where goal is a previously-given value I am seeking to match. The algorithm below illustrates this.

l = 0, h = 4000; // low, and high bounds for binary search
while (l &lt; h)
{
    n = (l + h) / 2;

    x = vars / n; // vars represents a more complex computation, x depends on n
        
    computed_goal = ax^4 + bx^3 + cx^2 + dx + c; // computed_goal depends on n
        
    if (computed_goal &lt; goal)
        l = n + 1;
    else
        h = n - 1;

    if (abs(computed_goal - goal) / goal &lt; 0.01) // check for relative error within 1%
        return n; // successfully matched goal value!
    else
        return 0; // iteration failed to match goal to desired precision
}

Now, I think you may already be seeing a problem if you are familiar with binary search algorithm, but I'm not super-verse on them. My problem does not always present itself but sometimes it is as follows:

Say my function gives me the following values for these n:

n = 2998: computed_goal = 967.8732568 (-2.1267432 = computed_goal - goal)
n = 2999: computed_goal = 968.6512540 (-1.3487460)
n = 3000: computed_goal = 969.4294677 (-0.5705323)
n = 3001: computed_goal = 970.2078977 (+0.2078977) &lt;= This is the optimal solution
n = 3002: computed_goal = 970.9865442 (+0.9865442)

Now, suppose l = 2987, and h = 3015. This makes n = 3001. My goal is to match value of 970. Note that n = 3001 is the optimal solution because |computed_goal - goal| is the smallest in magnitude and it is the closest to the goal

However, my algorithm looks at it and goes computed_goal = 970.2078977 &lt; goal = 970 = False, and sets new h to be n - 1 = 3000, thus excluding the optimal solution from any future iterations. It so happens that my algorithm finally incorrectly settles in on n = 2999, two steps away from optimal solution, here's a listing for reference:

   l    h      n    computed_goal    goal
2987 3015 = 3001, 970.207897725643 &lt; 970
2987 3000 = 2993, 963.9865161108   &lt; 970
2994 3000 = 2997, 967.095475944351 &lt; 970
2998 3000 = 2999, 968.651254012028 &lt; 970

Misunderstanding

I understand that I am checking but then excluding the optimal solution and because I'm not backtracking back to it, it gets lost and excluded...

I was told before that you should (always) exclude the endpoints that you've already checked, because you've already checked them, so it does not make sense to include them in future search bounds. So like in this case I've checked 3001, and in any future iterations I exclude that value from further checks. I am guessing that is incorrect, and I am guessing I must include them(?) Can you help me with theory part of this?

Attempt to fix.
I should note as well that when I change my if statement to include end points, like so:

   if (computed_goal &lt; goal)
        l = n;
    else
        h = n;

Algorithm keeps executing indefinitely in an infinite look with l = 3000 and h = 3001. I am not sure how to go around that. I could change my while to while(l + 1 &lt; h), which will break the infinite loop, but it looks off to me, and I am concerned that I may be introducing a different error into the code, and I don't have enough insight to say if I should be using the l + 1 &lt; h loop condition.

Function Sketch

为什么我的二分查找算法错过了最优解,如何改进它?

Code

function matchGoal(int $goal = 970): int
{
    $l = 0;
    $h = 3900;

    while ($l &lt; $h)
    {
        $n = intdiv($h + $l, 2);
        $x = 8321100 / $n;
        $factor = 5.113233695 * pow($n / 3000, 2);

        $computed_goal = (((-8.16965E-10 * $x + -3.15457E-06) * $x + 0.008437163) * $x + 207.9079475) * $factor - 0.076934117;

        if ($computed_goal &lt; $goal)
            $l = $n + 1;
        else
            $h = $n - 1;
    }

    $relative_error = 100 * abs($computed_goal - $goal) / $goal;
    return ($relative_error &lt; 1) ? $n : 0;
}

// outputs 2999, when 3001 is optimal
print matchGoal() . PHP_EOL; 

答案1

得分: 3

二分搜索需要一个单调二元谓词才能工作。
二元谓词是一个具有两个可能值的函数:真和假。
单调二元谓词在某一点上为真,在之后为假,或者反之。

假设问题是“找到使 f(X) 最接近 Y 的 X”。
进一步假设 f(X) 是递增的。

首先,我们将问题重新表述为一个带有二元谓词的问题,例如,“找到最小的 X,使得 f(X) >= Y”。
或者“找到最大的 X,使得 f(X) < Y”。
现在,这些问题可以通过二分搜索来解决。
在解决这个更简单的问题之后,意图是在某个简单的最终步骤中返回到原始问题:“检查 X 和 X-1”或“检查 X 和 X+1”,然后选择两者中的最佳解。

接下来,我想展示一个不对称版本的二分搜索,而不是从两边排除端点的方式。
二分搜索不一定要这样工作,但这可能以不同的方式展示方法,可能有助于理解。

让我们使用二分搜索来解决“找到最大的 X,使得 f(X) < Y”的版本。
形式上,二元谓词 pred(X) 的值为真,如果 f(X) < Y,为假,如果 f(X) >= Y。
现在,保持一个半区间 [lo, hi),使得 pred(lo) = 真 并且 pred(hi) = 假,会很方便。
最初,lo 是包含的初始下界,hi 是排除的初始上界。
检查中间值 me = (lo + hi) / 2;向任一方向取整,这个版本都可以工作。
如果 pred(me) = 真,那么 [me, hi) 是新的区间。
如果 pred(me) = 假,那么 [lo, me) 是新的区间。
继续,直到 lo + 1 = hi:这意味着 [lo, hi) 中唯一的点是 lo,并且它是答案(对于我们的二分搜索子问题,而不是原始问题的答案)。

英文:

The binary search would like a monotonic binary predicate to work.<br/>
A binary predicate is a function with two possible values: true and false.<br/>
A monotonic binary predicate is true up to some point and false after, or vice versa.

Suppose the problem is "find the X such that value f(X) is closest to Y".<br/>
Further, suppose f(X) is increasing.

First, we reformulate the problem as one with a binary predicate, for example, "find the smallest X such that f(X) >= Y".
Or "find the largest X such that f(X) < Y".
Now, these are problems solvable by binary search.
With the intent of, after solving this simpler problem, return to the original problem in some simple final step: "check X and X-1" or "check X and X+1", respectively, and pick the best of the two.

Next, I'd like to show an asymmetric version of binary search, instead of one excluding endpoints from both sides.
Binary search doesn't have to work this way, but this may show the method from a different point of view than you already have, which may help understanding.

Let us solve the "find the largest X such that f(X) < Y" version using binary search.
Formally, the value of the binary predicate pred(X) is true if f(X) < Y and false if f(X) >= Y.
Now, it would be convenient to maintain a semi-interval [lo, hi), such that pred(lo) = true and pred(hi) = false.
Initially, lo is the inclusive initial lower bound, and hi is the exclusive initial upper bound.
Check the middle value me = (lo + hi) / 2; round to either side, this version will work either way.
If pred(me) = true, then [me, hi) is the new interval.
If pred(me) = false, then [lo, me) is the new interval.
Proceed until lo + 1 = hi: this means the only point in [lo, hi) is lo, and it is the answer (to our binary search subproblem, not to the original problem).

答案2

得分: 1

我之前被告知,你应该(始终)排除已经检查过的端点,因为你已经检查过它们,所以在将来的搜索范围中包括它们没有意义。所以在这种情况下,我已经检查了3001,任何将来的迭代中我都会排除该值进行进一步的检查。我猜这是不正确的,我猜我必须包括它们(?)你能帮我理论部分吗?

只有当“检查”一个“端点”意味着你验证它不是正确答案时,才会正确。

想象一下在实际字典中查找一个单词。你可以开始打开书本,将单词是否在字典的中心点上方或下方进行检查,然后丢弃你知道不包含你的单词的那一半,分割剩下的部分,并重复这个过程,直到找到应该包含你的单词的页和列。但是,如果你打开字典,页首上的单词恰好是你要找的单词,你不会排除那个单词:你会中断整个过程并表示你已经找到了。

现在,如果字典中没有这个单词,但你的任务是找到最接近它的单词呢?你不能仅仅因为它不是你要找的单词就将它丢弃。你可能需要跟踪在你搜索的单词上方和下方遇到的最接近的端点,并在到达没有单词在它们之间的点时,看看哪个单词与你要查找的单词“最接近”。

英文:

> I was told before that you should (always) exclude the endpoints that you've already checked, because you've already checked them, so it does not make sense to include them in future search bounds. So like in this case I've checked 3001, and in any future iterations I exclude that value from further checks. I am guessing that is incorrect, and I am guessing I must include them(?) Can you help me with theory part of this?

That would only be correct if "checking" an "endpoint" means you verify that it's not the right answer.

Think about looking up a word in a real-world dictionary. You can start by opening the book to the center and checking whether the word is above or below that point in the dictionary, then discard the half which you know doesn't have your word, bisect the remainder, and repeat the process until you've found the page and column that should have your word. But if you open the dictionary and the word at the top of the page happens to be the word you're looking for, you don't exclude that word: you short-circuit the whole process and say you've won.

Now, what if the word isn't in the dictionary, but your job is to find the word nearest to it? You can't discard a word just because it's not the word you're looking for. You'd probably need to keep track of the closest endpoints you've encountered above and below the word you're searching for, and when you get to a point where there are no words between them then see which of those two words is "closest" to the one you're looking for.

答案3

得分: 1

你可以始终首先测试基本情况(这是一种安全的做法),所以:

while (l <= h)
{
    n = (l + h) / 2;

    x = vars / n; // vars表示更复杂的计算,x依赖于n

    computed_goal = ax^4 + bx^3 + cx^2 + dx + c; // computed_goal依赖于n

    // 达到基本情况?
    if (abs(computed_goal - goal) / goal < 0.01)
        return n; // 成功匹配目标值!

    // 未达到目标,对其中一个半进行递归
    if (computed_goal < goal)
        l = n + 1;
    else
        h = n - 1;
}

你也可以使用以下模式(来自维基百科),注意if else if...和L<=R:

function binary_search(A, n, T) is
    L := 0
    R := n  1
    while L  R do
        m := floor((L + R) / 2)
        if A[m] < T then
            L := m + 1
        else if A[m] > T then
            R := m  1
        else:
            return m
return unsuccessful

请注意确保始终达到目标,如果未达到目标,将结束while循环,必须返回一些内容!

英文:

You may always test for the base case first (a safe way to do it), so:

while (l &lt;= h)
 {
     n = (l + h) / 2;

     x = vars / n; // vars represents a more complex computation, x depends on n
     
     computed_goal = ax^4 + bx^3 + cx^2 + dx + c; // computed_goal depends on n
     
     // base case reached ?
     if (abs(computed_goal - goal) / goal &lt; 0.01)
         return n; // successfully matched goal value!

     // goal not reached, recurse on one of the halves
     if (computed_goal &lt; goal)
         l = n + 1;
     else
         h = n - 1;
 }

You may also use the follwing pattern (from Wikipedia), note the if else if... and the L<=R:

function binary_search(A, n, T) is
    L := 0
    R := n − 1
    while L ≤ R do
        m := floor((L + R) / 2)
        if A[m] &lt; T then
            L := m + 1
        else if A[m] &gt; T then
            R := m − 1
        else:
            return m
    return unsuccessful

Be careful to ensure that your goal is always reached, if not you will end the while and must return something!

huangapple
  • 本文由 发表于 2023年6月16日 05:46:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76485721.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定