大 O 时间复杂度是一个带有随机对象的 while 循环。

huangapple go评论72阅读模式
英文:

Big O time complexity of a while loop with a random Object

问题

public class Question2 {
    //running time of function is N!!!!!!
    public static boolean isThere(int[] array, int num, int index){
        boolean isItThere = false;   //running time of 1
        for(int i = 0; i <= index; i++){      //running time i
            if(array[i] == num){   //running time of 1
                isItThere = true;   //running time of 1
            }
        }
        return isItThere;
    }

    public static int[] firstAlgo(int N){
        Random random = new Random();  //running time of 1(initilizing)
        int[] arr = new int[N];
        for (int i = 0; i < N; i++){
            int temp = random.nextInt(N+1);  //running time of random is O(1)
            while (isThere(arr, temp, i)){
                temp = random.nextInt(N+1);
            }
            arr[i] = temp;
        }
        return arr;
    }
}
英文:
public class Question2 {
          //running time of function is N!!!!!!
    public static boolean isThere(int[] array, int num, int index){
        boolean isItThere = false;   //running time of 1
        for(int i =0; i &lt;= index; i++){      //running time i
            if(array[i] == num){   //running time of 1
                isItThere = true;   //running time of 1
            }
        }
        return isItThere;
    }


    public static int[] firstAlgo(int N){
            Random random = new Random();  //running time of 1(initilizing)k
          int[] arr = new int[N];
          for (int i = 0; i &lt; N; i++){
              int temp = random.nextInt(N+1);  //running time of random is O(1)
              while (isThere(arr, temp, i)){
                  temp = random.nextInt(N+1);
              }
              arr[i] = temp;
          }
         return arr;
    }

}

I want to figure out the time complexity of the while loop, I know the running time of the isThere function is N and So is the main for loop in firstAlgo

答案1

得分: 5

The short version:

  • 预期运行时间为 &Theta;(N<sup>2</sup> log N)
  • 我有支持这一点的数学证明,以及实证数据。

下面是一个图,比较了实际完成的工作量与我获得的适合形式为 N<sup>2</sup> log N 的函数的最佳拟合近似值,该函数是 (N<sup>2</sup> ln N) / 1.06:

大 O 时间复杂度是一个带有随机对象的 while 循环。

好奇吗?继续阅读。 大 O 时间复杂度是一个带有随机对象的 while 循环。

让我们从代码中退后一步,看看实际逻辑在做什么。代码的工作原理如下:对于数组的每个前缀 0、1、2、3、...、N,代码会不断生成介于 0 和 N 之间的随机数,直到生成一个之前没有出现过的数。然后,它将该数字写入当前的数组位置,并继续下一个位置。

在这个分析中,我们需要一些观察结果。想象一下,我们即将进入 firstAlgo 循环的第 k 次迭代。关于数组的前 k 个元素,我们能说些什么呢?我们知道以下信息:

  1. 位置 0、1、2、3、...、k-1 处的元素彼此都不同。这是因为每次循环迭代只有在找到之前数组中没有出现的内容时才会停止。
  2. 这些值中没有一个等于 0,因为数组最初被填充为 0,并且如果之前的步骤中生成了 0,它将不会被允许。
  3. 作为(1)和(2)的结果,位置 0、1、2、...、k-1 和 k 处的元素都不同。

现在,我们可以进行一些数学计算。让我们看看 firstAlgo 循环的第 k 次迭代。每次生成一个随机数时,我们查看(k+1)个数组位置,以确保该数字在其中没有出现。 (顺便说一句,我会将这个数量用作总工作量的代理,因为大部分的工作将用于扫描该数组。)因此,我们需要问的是 - 期望情况下,在我们找到一个唯一的数字之前,我们需要生成多少个随机数?

上面列表中的事实(3)在这里很有帮助。它指出,在第 k 次迭代时,数组的前 k+1 个位置彼此不同,我们需要生成一个与所有这些不同的数字不同的数字。我们可以选择 N+1 种随机数字,因此我们可以选择尚未使用过的数字有 N - k 个选项。这意味着我们选取一个尚未出现的数字的概率为 (N - k) / (N + 1)。

快速检查以确保该公式是正确的:当 k = 0 时,我们可以随意生成除 0 以外的任何随机数。有 N+1 个选择,因此我们执行这个操作的概率是 N / (N+1)。这与我们上面的公式匹配。当 k = N-1 时,所有先前的数组元素都是不同的,我们只能选择一个可以使用的数字。这意味着我们的成功概率是 1 / (N+1),再次与我们的公式匹配。太好了!

概率论中有一个有用的事实,即如果你不断投掷一个有概率 p 出现正面的有偏硬币,那么在你投掷正面之前,预期投掷的次数是 1 / p。(更正式地说,这是一个成功概率为 p 的几何随机变量的均值,你可以使用期望值的正式定义以及一些 Taylor 级数来证明这一点。)这意味着在 firstAlgo 循环的第 k 次迭代中,我们需要生成的预期随机数数量是 (N + 1) / (N - k)。

总的来说,这意味着在 firstAlgo 循环的第 k 次迭代中,预期的工作量由 (N + 1)(k + 1) / (N - k) 给出。这是因为

  • 期望情况下,会生成 (N + 1)/(N - k) 个数字,以及
  • 每个生成的数字需要检查 (k + 1) 个数组位置。

然后,我们可以从 k = 0 累加到 N - 1,得到总工作量

      0+1          1+1          2+1                 N
(N+1)----- + (N+1)----- + (N+1)----- + ... + (N+1)-----
      N-0          N-1          N-2                 1   

现在,我们只需简化这个求和,看看我们得到什么。 大 O 时间复杂度是一个带有随机对象的 while 循环。

让我们从这里提取出共同的 (N + 1) 项,得到

       /  1     2     3           N  \
 (N+1)|  --- + --- + --- + ... + --- |
       \  N    N-1   N-2          1  /

忽略掉 (N + 1) 项,我们需要简化求和

 1     2     3           N
--- + --- + --- + ... + ---
 N    N-1   N-2          1

现在,

英文:

The short version:

  • The expected runtime is &Theta;(N<sup>2</sup> log N).
  • I have the math to back this up, as well as empirical data.

Here's a plot comparing the empirical amount of work done to the best-fit approximation I got for a function of the form N<sup>2</sup> log N, which was (N<sup>2</sup> ln N) / 1.06:

大 O 时间复杂度是一个带有随机对象的 while 循环。

Curious? Keep reading. 大 O 时间复杂度是一个带有随机对象的 while 循环。

Let's take a step back from the code here and see what the actual logic is doing. The code works as follows: for each prefix of the array 0, 1, 2, 3, ..., N, the code continuously generates random numbers between 0 and N until it generates one that hasn't been seen before. It then writes down that number in the current array slot and moves on to the next one.

A few observations that we'll need in this analysis. Imagine that we're about to enter the kth iteration of the loop in firstAlgo. What can we say about the elements of the first k slots of the array? Well, we know the following:

  1. The elements at positions 0, 1, 2, 3, ..., k-1 are all different from one another. The reason for this is that each loop iteration only stops once it's found something that doesn't otherwise appear in the array up to that point.
  2. None of those values are equal to 0, because the array is initially populated with 0s and if 0 is generated in a previous step it won't be allowed.
  3. As a consequence of (1) and (2), the elements in slots 0, 1, 2, ..., k-1, and k are all different.

Now, we can get down to some math. Let's look at iteration k of the loop in firstAlgo. Each time we generate a random number, we look at (k+1) array slots to make sure the number doesn't appear there. (I'm going to use this quantity, by the way, as a proxy for the total work done, since most of the energy will be spent scanning that array.) So then we need to ask - on expectation, how many numbers are we going to generate before we find a unique one?

Fact (3) from the above list is helpful here. It says that on iteration k, the first k+1 slots in the array are different from one another, and we need to generate a number different from all of those. There are N+1 options of random numbers we can pick, so there are (N+1) - (k+1) = N - k options for numbers we could pick that won't have been used. This means that the probability that we pick a number that hasn't yet come up is (N - k) / (N + 1).

A quick check to make sure this formula is right: when k = 0, we are okay generating any random number other than 0. There are N+1 choices, so the probability we do this is N / (N+1). That matches our formula from above. When k = N-1, then all previous array elements are different and there's only one number we can pick that will work. That means we have a success probability of 1 / (N+1), again matching our formula. Cool!

There's a useful fact from probability theory that if you keep flipping a biased coin that has probability p of coming up heads, the expected number of flips before you flip heads is 1 / p. (More formally, that's the mean of a geometric random variable with success probability p, and you can prove this using the formal definition of expected values and some Taylor series.) This means that on the kth iteration of that loop, the expected number of random numbers we'll need to generate before we get a unique one is (N + 1) / (N - k).

Overall, this means that the expected amount of work done on iteration k of the loop in firstAlgo is given by (N + 1)(k + 1) / (N - k). That's because

  • there are, on expectation, (N + 1)/(N - k) numbers generated, and
  • each generated number requires (k + 1) array slots to be checked.

We can then get our total amount of work done by summing this up from k = 0 to N - 1. That gives us

      0+1          1+1          2+1                 N
(N+1)----- + (N+1)----- + (N+1)----- + ... + (N+1)-----
      N-0          N-1          N-2                 1   

Now, "all" we have to do is simplify this summation to see what we get back. 大 O 时间复杂度是一个带有随机对象的 while 循环。

Let's begin by factoring out the common (N + 1) term here, giving back

       /  1     2     3           N  \
 (N+1)|  --- + --- + --- + ... + --- |
       \  N    N-1   N-2          1  /

Ignoring that (N + 1) term, we're left with the task of simplifying the sum

 1     2     3           N
--- + --- + --- + ... + ---
 N    N-1   N-2          1

Now, how do we evaluate this sum? Here's a helpful fact. The sum

 1     1     1           1
--- + --- + --- + ... + ---
 N    N-1   N-2          1

gives back the Nth harmonic number (denoted H<sub>N</sub>) and is &Theta;(log N). More than just being &Theta;(log N), it's very, very close to ln N.

With that in mind, we can do this rewrite:

      1     2     3           N
     --- + --- + --- + ... + ---
      N    N-1   N-2          1

      1     1     1           1
=    --- + --- + --- + ... + ---
      N    N-1   N-2          1

            1     1           1    
+          --- + --- + ... + ---
           N-1   N-2          1

                  1           1    
+                --- + ... + ---
                 N-2          1

+                      ...


                              1    
+                            ---
                              1

The basic idea here is to treat (k + 1) / N as (k + 1) copies of the fraction 1 / N, and then to regroup them into separate rows like this.

Once we've done this, notice that the top row is the Nth harmonic number H<sub>n</sub>, and the item below that is the (N - 1)st harmonic number H<sub>n-1</sub>, and the item below that is the (N - 2)nd harmonic number H<sub>n - 2</sub>, etc. So this means that our fraction sum works out to

> H<sub>1</sub> + H<sub>2</sub> + H<sub>3</sub> + ... + H<sub>N</sub>
>
> = &Theta;(log 1 + log 2 + log 3 + ... + log N)
>
> = &Theta;(log N!) (properties of logs)
>
> = &Theta;(N log N) (Stirling's approximation).

Multiplying this in by the original factor of N that we pulled out earlier, we get that the overall runtime is &Theta;(N<sup>2</sup> log N).

So, does that hold up in practice? To find out, I ran the code over a range of inputs and counted the average number of iterations of the loop in isThere. I then divided each term by log N and did a polynomial-fit regression to see how closely the remainder matched &Theta;(N<sup>2</sup>). The regression found that the best polynomial plot had a polynomial term of N<sup>2.01</sup>, strongly supporting that (after multiplying back in the log N term) we're looking at N<sup>2</sup> log N. (Note that running the same regression but without first dividing out the log N term shows a fit of N<sup>2.15</sup>, which clearly indicates something other than N<sup>2</sup> is going on here.)

Using the the equation Predicted(N) = (N<sup>2</sup> ln N) / 1.06, with that last constant determined empirically, we get the plot up above, which is almost a perfect match.


Now, a quick coda to this problem. In retrospect, I should have predicted that the runtime would be &Theta;(N<sup>2</sup> log N) almost immediately. The reason why is that this problem is closely connected to the coupon collector's problem, a classic probability puzzle. In that puzzle, there are N different coupons to collect, and you keep collecting coupons at random. The question is then - on expectation, how many coupons do you need to collect before you'll have one copy of each of them?

That closely matches the problem we have here, where at each step we're picking from a bag of N+1 options and trying to eventually pick all of them. That would imply we need &Theta;(N log N) random draws across all iterations. However, the cost of each draw depends on which loop iteration we're in, with later draws costing more than earlier ones. Based on the assumption that most of the draws would be in the back half, I should have been able to guess that we'd do an average of &Theta;(N) work per draw, then multiplied that to get &Theta;(N<sup>2</sup> log N). But instead, I just rederived things from scratch. Oops.

Phew, that was a trip! Hope this helps!

答案2

得分: 2

你可以从经验上做这件事。只需运行代码,并随着不同的N值计时。我这样做了,复杂性似乎是O(N^2)。我运行了一堆测试,每次将N加倍,时间似乎从未从大约四倍左右的情况下恢复过来。在N = 256000时,我因等待而感到厌倦,大约花了20万毫秒。

如果你想要这样做,你可能需要进行更仔细的分析,进行更多的运行。你可以设置一个外部循环来进行批量测试,例如每个级别进行10次测试,计时,然后将N加倍,然后再次进行所有操作...记录所有时间。你可以在夜间运行测试,并对行为有一个相当清楚的了解。

如果这不是你想解决这个问题的方式,至少可以用这种方式来对答案进行双重检查。

英文:

You can do it empirically. Just run the code, and time it, with different values of N. I did so, and the complexity seems to be O(N^2). I ran a bunch of tests, doubling N each time, and the times never seemed to back off from roughly quadrupling. I got tired of waiting at N = 256000, which took about 200K millis.

If you wanted to go this way, you'd want to do a more careful analysis with more runs. You could set up an outer loop to keep doing batches of tests, say 10 at each level, timing them, and then doubling N and doing it all again...recording all the times. You could run the test overnight and get a pretty clear idea of the behavior empirically.

If this isn't the way you want to go about this, it could at least be a way to double-check your answer.

答案3

得分: 1

记住,大O表示的是最坏情况下的行为。正如评论中提到的,由于这段代码是非确定性的,它有可能永远不会终止,从而导致大O为无限大。

对于平均情况,其中随机操作符表现如预期。你在isThere函数中的时间复杂度为O(N)。对于最后一个迭代来查找一个值,你将平均执行N次操作。在这一点上,时间复杂度变为O(N^2)。最后,你需要重复这个操作N次来填充数组,这将使时间复杂度达到O(N^3)。

英文:

Remember that Big-O is worst case behavior. As mentioned in one of the comments, this has a chance to never terminate leading to Big-O of infinite because this code is non-deterministic.

For an average case where random does what's expected. You are looking at O(N) for the isThere function. For that last iteration to find a value, you will average N operations. At this point you are up to O(N^2). Finally you need to repeat this operation N times to fill the array, which brings you to O(N^3).

huangapple
  • 本文由 发表于 2020年9月17日 00:30:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/63924383.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定