修改1024真的比修改1023更快吗?

huangapple go评论66阅读模式
英文:

Is modding 1024 really faster than modding 1023?

问题

我的同事告诉我,对2的幂进行取模运算会被优化为位运算,比对其他数字进行取模运算更快。我查看了汇编代码,证实了他的说法。但是,我用Go语言编写了一个基准测试代码,并在Go版本1.17上运行了它。结果似乎没有太大的差异。为什么会这样,他是对的吗?

这是Go语言的代码:

package main

import (
	"fmt"
	"time"
)

const loop = 10000000000

func Mod1024() int {
	sum := 0
	for i := 0; i < loop; i++ {
		sum += i % 1024
	}
	return sum
}

func Mod1023() int {
	sum := 0
	for i := 0; i < loop; i++ {
		sum += i % 1023
	}
	return sum
}

func main() {
	start := time.Now()
	Mod1023()
	fmt.Println(time.Since(start).Microseconds())

	start = time.Now()
	Mod1024()
	fmt.Println(time.Since(start).Microseconds())
}

在我的电脑上运行的结果是:

2810668
2694136

请注意,我只提供了翻译服务,不会回答关于翻译内容的问题。

英文:

My colleage told me that code modding 2's power will be optimized to bit operation and faster than modding other numbers. And I've checked the assembly which proves his option. But I wrote a benchmark code in Golang and run it with Go version 1.17. It seems that there is no much difference. Why does that happened and is he right?

Here is the Golang code:

package main

import (
	&quot;fmt&quot;
	&quot;time&quot;
)

const loop = 10000000000

func Mod1024() int {
	sum := 0
	for i := 0; i &lt; loop; i++ {
		sum += i % 1024
	}
	return sum
}

func Mod1023() int {
	sum := 0
	for i := 0; i &lt; loop; i++ {
		sum += i % 1023
	}
	return sum
}

func main() {
	start := time.Now()
	Mod1023()
	fmt.Println(time.Since(start).Microseconds())

	start = time.Now()
	Mod1024()
	fmt.Println(time.Since(start).Microseconds())
}

The result on my computer is:

2810668
2694136

答案1

得分: 7

执行单个模运算非常快,大约在纳秒级别。你的Mod1024()Mod1023()函数做的事情要多得多:它们增加和测试一个循环变量,执行加法并将结果存储在一个局部变量中。所有这些操作都比简单的模运算要复杂得多,无论是否优化为位掩码。

此外,将任何代码作为主应用程序的一部分进行基准测试从来都不是一个好主意,有许多因素会扭曲结果(通常使其完全无用)。请参阅https://stackoverflow.com/questions/41608578/order-of-the-code-and-performance/41608707#41608707。始终使用Go的测试框架(具有基准测试功能)来更可靠地对代码进行基准测试。

因此,让我们修改示例函数:

func Mod1023() {
    for i := 23; i%1023 != 0; i++ {
    }
}

func Mod1024() {
    for i := 24; i%1024 != 0; i++ {
    }
}

(上述函数中的循环将执行1000次迭代。)

然后,使用Go的测试框架编写适当的基准测试函数:

func BenchmarkMod1023(b *testing.B) {
    for i := 0; i < b.N; i++ {
        Mod1023()
    }
}

func BenchmarkMod1024(b *testing.B) {
    for i := 0; i < b.N; i++ {
        Mod1024()
    }
}

使用go test -bench .运行基准测试,输出结果如下:

BenchmarkMod1023-8        881263              1346 ns/op
BenchmarkMod1024-8       3710430               325.4 ns/op

因此,通过1024进行模运算得到了优化的位掩码操作,实际上更快:大约快了4倍。由于优化,你节省了整整1纳秒的时间(1.3 ns vs 0.3 ns)。尽管只有在需要执行数百万次此操作且没有其他比简单的CPU模运算更慢的代码时,这才会有所影响。

英文:

Executing a single modulo operation is really fast, it's in the magnitude of a single nanosecond. What your Mod1024() and Mod1023() functions do are a lot more: they increment and test a loop variable, perform addition and store the result in a local variable. These altogether are orders of magnitude more than a simple modding operation, be it optimized to bit masking or not.

Moreover, benchmarking any code as part of the main app is never a good idea, there are numerous factors that distort the result (often make it completely useless). See https://stackoverflow.com/questions/41608578/order-of-the-code-and-performance/41608707#41608707. Always use Go's testing framework (with benchmarking capabilities) to more reliably benchmark code.

So let's modify the example functions:

func Mod1023() {
	for i := 23; i%1023 != 0; i++ {
	}
}

func Mod1024() {
	for i := 24; i%1024 != 0; i++ {
	}
}

(The loops in the above functions will have 1000 iterations.)

And let's write proper benchmarking functions using Go's testing framework:

func BenchmarkMod1023(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		Mod1023()
	}
}

func BenchmarkMod1024(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		Mod1024()
	}
}

Running the benchmark using go test -bench ., the output is:

BenchmarkMod1023-8        881263              1346 ns/op
BenchmarkMod1024-8       3710430               325.4 ns/op

So yes, modding by 1024 gets optimized to bitmasking and is in fact faster: about 4 times faster. A whole nanosecond is saved for you due to optimization (1.3 ns vs 0.3 ns). Although this only matters if you have to execute this millions of times and no other code are part of the execution that are slower than a simple CPU mod operation.

huangapple
  • 本文由 发表于 2021年12月21日 21:43:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/70436444.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定