英文:
Fibonacci: non-recursive vs memoized recursive puzzling timing results
问题
在观看了一场关于动态规划的MIT讲座后,我想练习一下斐波那契数列。我首先编写了朴素递归实现,然后添加了记忆化。这是记忆化版本的代码:
package main
import (
"fmt"
)
func fib_memoized(n int, memo map[int]int64) int64 {
memoized, ok := memo[n]
if ok {
return memoized
}
if n < 2 {
return int64(n)
}
f := fib_memoized(n-2, memo) + fib_memoized(n-1, memo)
memo[n] = f
return f
}
func main() {
memo := make(map[int]int64)
for i := 0; i < 10000; i++ {
fmt.Printf("fib(%d) = %d\n", i, fib_memoized(i, memo))
}
}
然后,我继续编写了一个非递归版本的程序:
package main
import (
"fmt"
)
func fib(n int) int64 {
var f1 int64 = 1
var f2 int64 = 0
for i := 0; i < n; i++ {
f1, f2 = f2, f1+f2
}
return f2
}
func main() {
for i := 0; i < 10000; i++ {
fmt.Printf("fib(%d) = %d\n", i, fib(i))
}
}
让我困惑的是,记忆化版本似乎至少与非递归版本一样好,有时甚至表现更好。自然地,我期望记忆化相对于朴素递归实现带来很大的改进,但我一直无法弄清楚为什么/如何记忆化版本能够与非递归版本持平甚至超过它。
我尝试查看了两个版本的汇编输出(使用go tool compile -S
获得),但没有找到答案。我仍然在记忆化版本中看到了CALL
指令,我认为这应该会产生足够的开销,以至少使其稍微落后于非递归版本。
有没有更了解的人可以帮助我理解发生了什么?
附注:我知道存在整数溢出问题;我使用10000只是为了增加负载。
谢谢。
英文:
After watching an MIT lecture on dynamic programming, I felt like practicing a bit with Fibonacci. I first wrote the naive recursive implementation and then added memoization. Here is the memoized version:
package main
import (
"fmt"
)
func fib_memoized(n int, memo map[int]int64) int64 {
memoized, ok := memo[n]
if ok {
return memoized
}
if n < 2 {
return int64(n)
}
f := fib_memoized(n-2, memo) + fib_memoized(n-1, memo)
memo[n] = f
return f
}
func main() {
memo := make(map[int]int64)
for i := 0; i < 10000; i++ {
fmt.Printf("fib(%d) = %d\n", i, fib_memoized(i, memo))
}
}
I then proceeded to write a non-recursive version of the program:
package main
import (
"fmt"
)
func fib(n int) int64 {
var f1 int64 = 1
var f2 int64 = 0
for i := 0; i < n; i++ {
f1, f2 = f2, f1+f2
}
return f2
}
func main() {
for i := 0; i < 10000; i++ {
fmt.Printf("fib(%d) = %d\n", i, fib(i))
}
}
What has puzzled me is the fact that the memoized version seems to perform at least as well as the non-recursive one and, sometimes, even outperform it. Naturally, I was expecting memoization to bring great improvements in comparison to the naive recursive implementation, but I just haven't been able to figure out why/how the memoized version could be on par and even surpass its non-recursive counterpart.
I did try looking into the assembly output (obtained with go tool compile -S
) of both versions, but to no avail. I still see the CALL
instructions in the memoized version and, to my mind, that should incur enough overhead to justify it being at least slightly outperformed by the non-recursive version.
Could anyone more knowledgeable help me understand what's going on?
P.S. I'm aware of integer overflow; I used 10000 just to increase the load.
Thank you.
答案1
得分: 4
记住一件非常重要的事情:memo
在测试台的每次迭代之间是保留的。因此,记忆化版本在main
循环的每次迭代中最多有两个递归调用。也就是说,你允许记忆化版本在各个迭代之间保留内存,而迭代版本需要在每次迭代中从头开始计算。
下一个要点:
编写基准测试是棘手的。微小的细节可能对结果产生重大影响。例如,调用printf
很可能需要相当长的时间来执行,但实际上并不计算斐波那契计算的运行时间。我没有任何环境可用来测试这些IO操作实际上对计时的影响有多大,但很可能是相当大的。特别是因为你的算法只运行了微小的10000次迭代,或者几乎只有100微秒,如@Brackens answer所示。
因此,总结一下:
从基准测试中去除IO操作,在每次迭代中从空的memo
开始,并增加迭代次数以获得更好的计时。
英文:
One quite important thing to keep in mind: memo
is preserved between iterations of the testbench. So the memoized version has at most two recursive calls per iteration of the loop in main
. I.e. you allow the memoized version to keep memory between individual iterations, while the iterative version needs to calculate from scratch in each iteration.
Next point:
Writing benchmarks is tricky. Tiny details can have a significant impact on the outcome. E.g. the calls to printf
most likely take a considerable time to execute, but don't actually account for the runtime of the fibonacci-calculation. I don't have any environment available to test how much the timing is actually affected by these IO-operations, but it's quite likely considerable. Especially since your algorithm is running for a rather minuscule 10000 iterations, or barely 100 microseconds, as can be seen in @Brackens answer.
So to summarize:
Drop the IO from the benchmark, start with an empty memo
in each iteration and increase the number of iterations to get better timing.
答案2
得分: 3
我认为你在问为什么记忆化递归实现的速度并没有比迭代实现快很多。虽然你提到了一个“朴素的递归实现”,但你没有展示它是什么样子。
通过基准测试,你可以看到两者的性能是可比较的,也许迭代的速度稍微快一些:
package kata
import (
"fmt"
"os"
"testing"
)
func fib_memoized(n int, memo map[int]int64) int64 {
memoized, ok := memo[n]
if ok {
return memoized
}
if n < 2 {
return int64(n)
}
f := fib_memoized(n-2, memo) + fib_memoized(n-1, memo)
memo[n] = f
return f
}
func fib(n int) int64 {
var f1 int64 = 1
var f2 int64 = 0
for i := 0; i < n; i++ {
f1, f2 = f2, f1+f2
}
return f2
}
func BenchmarkFib(b *testing.B) {
out, err := os.Create("/dev/null")
if err != nil {
b.Fatal("Can't open: ", err)
}
b.Run("Recursive Memoized", func(b *testing.B) {
memo := make(map[int]int64)
for j := 0; j < b.N; j++ {
for i := 0; i < 100; i++ {
fmt.Fprintf(out, "fib(%d) = %d\n", i, fib_memoized(i, memo))
}
}
})
b.Run("Iterative", func(b *testing.B) {
for j := 0; j < b.N; j++ {
for i := 0; i < 100; i++ {
fmt.Fprintf(out, "fib(%d) = %d\n", i, fib(i))
}
}
})
}
% go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/brackendawson/kata
cpu: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
BenchmarkLoop/Recursive_Memoized-12 13424 91082 ns/op
BenchmarkLoop/Iterative-12 13917 82837 ns/op
PASS
ok github.com/brackendawson/kata 4.323s
我认为你的记忆化递归实现速度并没有快很多,原因可能是:
- Go语言没有很好的尾递归优化(TCO)。正如你从汇编代码中看到的,仍然存在CALL指令,只有当CALL指令可以被优化掉时,递归才会更快。
- 你的记忆化递归实现不是尾递归,递归调用必须是函数中的最后一条语句才能使用尾递归优化。
英文:
I think you are asking why the memoized recursive implementation is not much faster than the iterative implementation. Though you mention a "naive recursive implementation" that you do not show?
Using benchmarking you can see the performance of both is comparable, maybe iterative is a little faster:
package kata
import (
"fmt"
"os"
"testing"
)
func fib_memoized(n int, memo map[int]int64) int64 {
memoized, ok := memo[n]
if ok {
return memoized
}
if n < 2 {
return int64(n)
}
f := fib_memoized(n-2, memo) + fib_memoized(n-1, memo)
memo[n] = f
return f
}
func fib(n int) int64 {
var f1 int64 = 1
var f2 int64 = 0
for i := 0; i < n; i++ {
f1, f2 = f2, f1+f2
}
return f2
}
func BenchmarkFib(b *testing.B) {
out, err := os.Create("/dev/null")
if err != nil {
b.Fatal("Can't open: ", err)
}
b.Run("Recursive Memoized", func(b *testing.B) {
memo := make(map[int]int64)
for j := 0; j < b.N; j++ {
for i := 0; i < 100; i++ {
fmt.Fprintf(out, "fib(%d) = %d\n", i, fib_memoized(i, memo))
}
}
})
b.Run("Iterative", func(b *testing.B) {
for j := 0; j < b.N; j++ {
for i := 0; i < 100; i++ {
fmt.Fprintf(out, "fib(%d) = %d\n", i, fib(i))
}
}
})
}
% go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/brackendawson/kata
cpu: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
BenchmarkLoop/Recursive_Memoized-12 13424 91082 ns/op
BenchmarkLoop/Iterative-12 13917 82837 ns/op
PASS
ok github.com/brackendawson/kata 4.323s
I expect your memoized recursive implementation is not much faster because:
- Go does not have good Tail Call Optimization (TCO). As you may have seen from the assembly there are still CALLs, recursion is usually only faster if the CALLs can be optimised out.
- Your memoized recursive implementation is not a tail call, the recursive call must be the last statement in the function to use TCOs.
答案3
得分: 1
这不是对你问题的回答,但是有一种方法可以在log(N)时间复杂度内得到第N个斐波那契数。
你只需要使用二进制矩阵乘法将矩阵
| 0 1 |
| 1 1 |
的N次方。
相关资料链接:
https://kukuruku.co/post/the-nth-fibonacci-number-in-olog-n/
https://www.youtube.com/watch?v=eMXNWcbw75E
英文:
This is not an answer to your question, but there is a way to get the N-th Fibonacci number in log(N)<br>
All you need is to raise the matrix<br>
| 0 1 |<br>
| 1 1 |<br>
to the power N using binary matrix exponentiation<br>
links to materials:<br>
https://kukuruku.co/post/the-nth-fibonacci-number-in-olog-n/<br>
https://www.youtube.com/watch?v=eMXNWcbw75E
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论