英文:
Go CountBits exercise: n strings slices performing a lot better than log2n math package calls
问题
我正在尝试解决一个练习,其中需要计算任意数字的'1'位数。我想出了4个主要的思路:
- 创建一个递归函数countBitsString,在每次迭代中,它获取对应于二进制数的字符串,并删除第一个字符。如果这个字符是'1',则增加一个计数器并继续。
- 与上述相同,但是使用指向该字符串的指针而不是值(TODO)。
- 使用对数逻辑来理解这个数字中有多少个2的幂,对于每个幂,我们有一个'1'位,所以增加计数器。如果数字是奇数,则再增加一个(countBitsSmart)。
- 使用二进制逻辑来检查位,然后左移(TODO)。
现在,这是我的当前代码和结果:
package main
import (
"fmt"
"strconv"
"math/rand"
"math"
"time"
)
/*
* implement a function that counts the number of set of bits in binary representation
* Complete the 'countBits' function below.
* The function is expected to return an int32.
* The function accepts unit32 num as parameter.
*/
func countBitsString(num uint32) int32 {
num_64 := int64(num)
num_2 := strconv.FormatInt(num_64, 2)
_, counter := recursiveCount(num_2, 0)
return counter
}
func recursiveCount(s string, counter int32) (string, int32) {
if len(s) < 1 {
return "", counter
}
if s[:1] == "1" {
counter+=1
}
//fmt.Printf("Current counter: %d\n", counter)
//fmt.Printf("Current char %s\n", s[:1])
return recursiveCount(s[1:], counter)
}
func countBitsBinary(num uint32) int32 {
fmt.Println("TODO")
// convert uint23 in binary form
// for binary_number > 0
// if current bit == 1 => counter++
// shift left binary_number
return 0
}
func countBitsSmart(num uint32) int32{
var upBits int32
var highestPower uint32 = math.MaxUint32
if num % 2 >0 {
upBits+=1
// fmt.Printf("\tUP bits = %d\n", upBits)
}
for ; (num > 2 && highestPower > 1); upBits++ {
highestPower = uint32(math.Log2(float64(num)))
num = num - uint32(math.Pow(2, float64(highestPower)))
// fmt.Printf("\tlog2 = %d\n", highestPower)
// fmt.Printf("\tnum %d rest %d\n", num, num%2)
// fmt.Printf("\tUP bits = %d\n", upBits)
}
return upBits
}
// Profiling with execution time and calling
func invoker(numInput []uint32, funcName string, countFunc func(num uint32) int32) {
fmt.Println("\n=========================================================")
start := time.Now()
for _, numTemp := range numInput{
// fmt.Printf("> Number: %d (%s)\tUp bits: %d\n", numTemp, strconv.FormatInt(int64(numTemp), 2), countFunc( uint32(numTemp) ))
countFunc( uint32(numTemp))
}
fmt.Printf("Time elapsed for %v func= %vs", funcName, time.Since(start))
}
func main() {
const testSize = 10000000
var numInput = make([]uint32, testSize)
for i:=0; i<testSize; i++ {
numInput[i] = uint32(rand.Intn(500))
}
fmt.Printf("\n Test size = %d\n", testSize)
invoker(numInput, "countBitsString", countBitsString)
invoker(numInput, "countBitsSmart", countBitsSmart)
fmt.Println()
}
第二个函数为什么执行速度比前一个慢5倍?仅仅是因为调用了math包吗?
谢谢
英文:
I'm trying to solve an exercise where you have to count the number of '1' bits of any number.
I came up with 4 main ideas:
- Make a recursive function countBitsString, where at each iteration it gets the string corresponding to the base2 number removing the first character. If this is 1, increment a counter and continue.
- The same as above but with a pointer to that string instead of the value (TODO)
- Use logarithm logic to understand how many power of 2 are inside this number, for each of them we have a '1' bit up so increment the counter. Increment one more if the number is odd (countBitsSmart)
- Work with binary logic to check the bit and then shift left (TODO)
Now, here's my current code and later the results
package main
import (
"fmt"
"strconv"
"math/rand"
"math"
"time"
)
/*
* implement a function that counts the number of set of bits in binary representation
* Complete the 'countBits' function below.
* The function is expected to return an int32.
* The function accepts unit32 num as parameter.
*/
func countBitsString(num uint32) int32 {
num_64 := int64(num)
num_2 := strconv.FormatInt(num_64, 2)
_, counter := recursiveCount(num_2, 0)
return counter
}
func recursiveCount(s string, counter int32) (string, int32) {
if len(s) < 1 {
return "", counter
}
if s[:1] == "1" {
counter+=1
}
//fmt.Printf("Current counter: %d\n", counter)
//fmt.Printf("Current char %s\n", s[:1])
return recursiveCount(s[1:], counter)
}
func countBitsBinary(num uint32) int32 {
fmt.Println("TODO")
// convert uint23 in binary form
// for binary_number > 0
// if current bit == 1 => counter++
// shift left binary_number
return 0
}
func countBitsSmart(num uint32) int32{
var upBits int32
var highestPower uint32 = math.MaxUint32
if num % 2 >0 {
upBits+=1
// fmt.Printf("\tUP bits = %d\n", upBits)
}
for ; (num > 2 && highestPower > 1); upBits++ {
highestPower = uint32(math.Log2(float64(num)))
num = num - uint32(math.Pow(2, float64(highestPower)))
// fmt.Printf("\tlog2 = %d\n", highestPower)
// fmt.Printf("\tnum %d rest %d\n", num, num%2)
// fmt.Printf("\tUP bits = %d\n", upBits)
}
return upBits
}
// Profiling with execution time and calling
func invoker(numInput []uint32, funcName string, countFunc func(num uint32) int32) {
fmt.Println("\n=========================================================")
start := time.Now()
for _, numTemp := range numInput{
// fmt.Printf("> Number: %d (%s)\tUp bits: %d\n", numTemp, strconv.FormatInt(int64(numTemp), 2), countFunc( uint32(numTemp) ))
countFunc( uint32(numTemp))
}
fmt.Printf("Time elapsed for %v func= %vs", funcName, time.Since(start))
}
func main() {
const testSize = 10000000
var numInput = make([]uint32, testSize)
for i:=0; i<testSize; i++ {
numInput[i] = uint32(rand.Intn(500))
}
fmt.Printf("\n Test size = %d\n", testSize)
invoker(numInput, "countBitsString", countBitsString)
invoker(numInput, "countBitsSmart", countBitsSmart)
fmt.Println()
}
How can the second function perform 5 times worse than the previous one?
Is it just for the calls to math package?
Thanks
答案1
得分: 1
使用浮点数的Log2/Pow计算是一种非常昂贵的计算位数的方法。转换为二进制字符串要快得多,因为它可以通过位移/整数计算来完成(尽管仍然很昂贵)。维基百科的汉明重量页面上有一些通过位掩码进行非常快速的popcount计算的解释。
您可以通过Go的内置基准支持来查看相对性能。它可以帮助理解代码的性能:
// main_test.go
package main
import (
"math/bits"
"testing"
)
// Simple benchmarks to get started.
// You could also try benchmarking different numbers to understand
// performance differences. There is some inherit bias in benchmarking
// sequential numbers starting from 0. You could try a list of 256 preset
// random numbers. There are many options..
func BenchmarkCountBitsString(b *testing.B) {
for i := 0; i < b.N; i++ {
dummyInt32 = countBitsString(uint32(i))
}
}
func BenchmarkCountBitsSmart(b *testing.B) {
for i := 0; i < b.N; i++ {
dummyInt32 = countBitsSmart(uint32(i))
}
}
func BenchmarkOnesCount(b *testing.B) {
for i := 0; i < b.N; i++ {
dummyInt = bits.OnesCount32(uint32(i))
}
}
// Ensure some optimisations don't occur by assigning to a global.
var (
dummyInt int
dummyInt32 int32
)
然后,您可以轻松了解相对性能:
$ go test -bench .
goos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsString-8 10388419 99.97 ns/op
BenchmarkCountBitsSmart-8 2036588 617.6 ns/op
BenchmarkOnesCount-8 1000000000 0.5540 ns/op
PASS
ok stack/bench 3.625s
使用pprof检查性能需要一些额外的参数:
- 使用
-run XXX
禁用测试(XXX不匹配任何测试) - 限制为单个函数基准测试,否则更快的函数将运行更多次,直到它消耗与较慢函数相同的时间(在最后一次运行中约为5秒)。
- 对基准时间使用固定的迭代次数,以便函数之间的配置文件可比较
- 保存CPU配置文件。这也会导致测试二进制文件保留为
DIR.test
(在此示例中为bench.test
)。
countBitsSmart
的基准详细信息显示它花费了约55%的时间在math.Log2
上,约38%的时间在math.Pow
上:
$ go test -bench CountBitsSmart -run XXX -benchtime 10000000x -cpuprofile cpu.prof
goos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsSmart-8 10000000 703.0 ns/op
PASS
ok stack/bench 7.226s
$ go tool pprof -top -cum bench.test cpu.prof
File: bench.test
Type: cpu
Time: Apr 17, 2022 at 11:19pm (AEST)
Duration: 7.22s, Total samples = 7.01s (97.10%)
Showing nodes accounting for 7s, 99.86% of 7.01s total
Dropped 1 node (cum <= 0.04s)
flat flat% sum% cum cum%
0.01s 0.14% 0.14% 7.01s 100% stack/bench.BenchmarkCountBitsSmart
0 0% 0.14% 7.01s 100% testing.(*B).launch
0 0% 0.14% 7.01s 100% testing.(*B).runN
0.43s 6.13% 6.28% 7s 99.86% stack/bench.countBitsSmart
0.07s 1% 7.28% 3.88s 55.35% math.Log2 (inline)
0.42s 5.99% 13.27% 3.81s 54.35% math.log2
0.02s 0.29% 13.55% 3.07s 43.79% math.Log (inline)
3.05s 43.51% 57.06% 3.05s 43.51% math.archLog
0.15s 2.14% 59.20% 2.69s 38.37% math.Pow (inline)
1.20s 17.12% 76.32% 2.54s 36.23% math.pow
0.07s 1% 77.32% 0.57s 8.13% math.Frexp (inline)
[...]
$ go tool pprof -list bench.countBitsSmart bench.test cpu.prof
Total: 7.01s
ROUTINE ======================== stack/bench.countBitsSmart in /home/.../stack/bench/main.go
430ms 7s (flat, cum) 99.86% of Total
. . 54: if num%2 > 0 {
. . 55: upBits += 1
. . 56: // fmt.Printf("\tUP bits = %d\n", upBits)
. . 57: }
. . 58:
40ms 40ms 59: for ; num > 2 && highestPower > 1; upBits++ {
250ms 4.13s 60: highestPower = uint32(math.Log2(float64(num)))
90ms 2.78s 61: num = num - uint32(math.Pow(2, float64(highestPower)))
. . 62: // fmt.Printf("\tlog2 = %d\n", highestPower)
. . 63: // fmt.Printf("\tnum %d rest %d\n", num, num%2)
. . 64: // fmt.Printf("\tUP bits = %d\n", upBits)
. . 65: }
. . 66:
50ms 50ms 67: return upBits
. . 68:}
. . 69:
. . 70:// Profiling with execution time and calling
. . 71:func invoker(numInput []uint32, funcName string, countFunc func(num uint32) int32) {
. . 72: fmt.Println("\n=========================================================")
countBitsString基准测试显示约一半的时间花在recursiveCount
上,另一半花在strconv.FormatInt
上。这表明recursiveCount
可能是优化的好目标。
$ go test -bench CountBitsString -run XXX -benchtime 10000000x -cpuprofile cpu.profgoos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsString-8 10000000 99.13 ns/op
PASS
ok stack/bench 1.111s
$ go tool pprof -top -cum bench.test cpu.prof
File: bench.test
Type: cpu
Time: Apr 17, 2022 at 11:15pm (AEST)
Duration: 1.10s, Total samples = 1.05s (95.18%)
Showing nodes accounting for 1.05s, 100% of 1.05s total
flat flat% sum% cum cum%
0.02s 1.90% 1.90% 1.01s 96.19% stack/bench.BenchmarkCountBitsString
0 0% 1.90% 1.01s 96.19% testing.(*B).launch
0 0% 1.90% 1.01s 96.19% testing.(*B).runN
0.01s 0.95% 2.86% 0.99s 94.29% stack/bench.countBitsString
0.51s 48.57% 51.43% 0.51s 48.57% stack/bench.recursiveCount
0 0% 51.43% 0.47s 44.76% strconv.FormatInt
0.21s 20.00% 71.43% 0.47s 44.76% strconv.formatBits
0.03s 2.86% 74.29% 0.26s 24.76% runtime.slicebytetostring
0.14s 13.33% 87.62% 0.22s 20.95% runtime.mallocgc
0.04s 3.81% 91.43% 0.04s 3.81% runtime.nextFreeFast (inline)
[...]
$ go tool pprof -list bench.countBitsString bench.test cpu.prof
Total: 1.05s
ROUTINE ======================== stack/bench.countBitsString in /home/.../stack/bench/main.go
10ms 990ms (flat, cum) 94.29% of Total
. . 14: * implement a function that counts the number of set of bits in binary representation
. . 15: * Complete the 'countBits' function below.
. . 16: * The function is expected to return an int32.
. . 17: * The function accepts unit32 num as parameter.
. . 18: */
10ms 10ms 19:func countBitsString(num uint32) int32 {
. . 20:
. . 21: num_64 := int64(num)
. 470ms 22: num_2 := strconv.FormatInt(num_64, 2)
. 510ms 23: _, counter := recursiveCount(num_2, 0)
. . 24: return counter
. . 25:}
. . 26:
. . 27:func recursiveCount(s string, counter int32) (string, int32) {
. . 28: if len(s) < 1 {
这些基准测试显示math.Log2
+math.Pow
(6.57s)比strconv.FormatInt
(0.47s)慢约14倍。
我还强烈推荐使用pprof Web界面来交互式地探索性能。这有点不明显,但可以通过以下方式启动:
$ go tool pprof -http : bench.test cpu.prof
Dave Cheney还在他的博客中详细介绍了如何在Go中进行基准测试:
https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go
英文:
Using floating point Log2/Pow calculations is a very expensive way to calculate bits. Converting to a binary string is much faster since it can be done with bit shifts/integer calculations (it's still expensive). The Wikipedia Hamming Weight page has some explanations of very fast popcount calculations via bitmasks.
You can see the relative performance via Go's builtin benchmark support. It can help understand how code performs:
// main_test.go
package main
import (
"math/bits"
"testing"
)
// Simple benchmarks to get started.
// You could also try benchmarking different numbers to understand
// performance differences. There is some inherit bias in benchmarking
// sequential numbers starting from 0. You could try a list of 256 preset
// random numbers. There are many options..
func BenchmarkCountBitsString(b *testing.B) {
for i := 0; i < b.N; i++ {
dummyInt32 = countBitsString(uint32(i))
}
}
func BenchmarkCountBitsSmart(b *testing.B) {
for i := 0; i < b.N; i++ {
dummyInt32 = countBitsSmart(uint32(i))
}
}
func BenchmarkOnesCount(b *testing.B) {
for i := 0; i < b.N; i++ {
dummyInt = bits.OnesCount32(uint32(i))
}
}
// Ensure some optimisations don't occur by assigning to a global.
var (
dummyInt int
dummyInt32 int32
)
Then you can easily understand the relative performance:
$ go test -bench .
goos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsString-8 10388419 99.97 ns/op
BenchmarkCountBitsSmart-8 2036588 617.6 ns/op
BenchmarkOnesCount-8 1000000000 0.5540 ns/op
PASS
ok stack/bench 3.625s
Inspecting performance with pprof requires some extra parameters:
- Disable tests with
-run XXX
(XXX doesn't match any tests) - Limit to a single function benchmark, otherwise the faster function will be run more often until it consumes as much time as the slower function (~5s in the final run).
- Use a fixed number of iterations for the benchmark time so the profiles are comparable between functions
- Save CPU profile. This also causes the test binary to be kept as
DIR.test
(bench.test
in this example).
Benchmark details for countBitsSmart
show it spends ~55% time in math.Log2
and ~38% time in math.Pow
:
$ go test -bench CountBitsSmart -run XXX -benchtime 10000000x -cpuprofile cpu.prof
goos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsSmart-8 10000000 703.0 ns/op
PASS
ok stack/bench 7.226s
$ go tool pprof -top -cum bench.test cpu.prof
File: bench.test
Type: cpu
Time: Apr 17, 2022 at 11:19pm (AEST)
Duration: 7.22s, Total samples = 7.01s (97.10%)
Showing nodes accounting for 7s, 99.86% of 7.01s total
Dropped 1 node (cum <= 0.04s)
flat flat% sum% cum cum%
0.01s 0.14% 0.14% 7.01s 100% stack/bench.BenchmarkCountBitsSmart
0 0% 0.14% 7.01s 100% testing.(*B).launch
0 0% 0.14% 7.01s 100% testing.(*B).runN
0.43s 6.13% 6.28% 7s 99.86% stack/bench.countBitsSmart
0.07s 1% 7.28% 3.88s 55.35% math.Log2 (inline)
0.42s 5.99% 13.27% 3.81s 54.35% math.log2
0.02s 0.29% 13.55% 3.07s 43.79% math.Log (inline)
3.05s 43.51% 57.06% 3.05s 43.51% math.archLog
0.15s 2.14% 59.20% 2.69s 38.37% math.Pow (inline)
1.20s 17.12% 76.32% 2.54s 36.23% math.pow
0.07s 1% 77.32% 0.57s 8.13% math.Frexp (inline)
[...]
$ go tool pprof -list bench.countBitsSmart bench.test cpu.prof
Total: 7.01s
ROUTINE ======================== stack/bench.countBitsSmart in /home/.../stack/bench/main.go
430ms 7s (flat, cum) 99.86% of Total
. . 54: if num%2 > 0 {
. . 55: upBits += 1
. . 56: // fmt.Printf("\tUP bits = %d\n", upBits)
. . 57: }
. . 58:
40ms 40ms 59: for ; num > 2 && highestPower > 1; upBits++ {
250ms 4.13s 60: highestPower = uint32(math.Log2(float64(num)))
90ms 2.78s 61: num = num - uint32(math.Pow(2, float64(highestPower)))
. . 62: // fmt.Printf("\tlog2 = %d\n", highestPower)
. . 63: // fmt.Printf("\tnum %d rest %d\n", num, num%2)
. . 64: // fmt.Printf("\tUP bits = %d\n", upBits)
. . 65: }
. . 66:
50ms 50ms 67: return upBits
. . 68:}
. . 69:
. . 70:// Profiling with execution time and calling
. . 71:func invoker(numInput []uint32, funcName string, countFunc func(num uint32) int32) {
. . 72: fmt.Println("\n=========================================================")
The countBitsString benchmark shows about half the time is spent in recursiveCount
and half in strconv.FormatInt
. This indicates recursiveCount
could be a good target for optimisation.
$ go test -bench CountBitsString -run XXX -benchtime 10000000x -cpuprofile cpu.profgoos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsString-8 10000000 99.13 ns/op
PASS
ok stack/bench 1.111s
$ go tool pprof -top -cum bench.test cpu.prof
File: bench.test
Type: cpu
Time: Apr 17, 2022 at 11:15pm (AEST)
Duration: 1.10s, Total samples = 1.05s (95.18%)
Showing nodes accounting for 1.05s, 100% of 1.05s total
flat flat% sum% cum cum%
0.02s 1.90% 1.90% 1.01s 96.19% stack/bench.BenchmarkCountBitsString
0 0% 1.90% 1.01s 96.19% testing.(*B).launch
0 0% 1.90% 1.01s 96.19% testing.(*B).runN
0.01s 0.95% 2.86% 0.99s 94.29% stack/bench.countBitsString
0.51s 48.57% 51.43% 0.51s 48.57% stack/bench.recursiveCount
0 0% 51.43% 0.47s 44.76% strconv.FormatInt
0.21s 20.00% 71.43% 0.47s 44.76% strconv.formatBits
0.03s 2.86% 74.29% 0.26s 24.76% runtime.slicebytetostring
0.14s 13.33% 87.62% 0.22s 20.95% runtime.mallocgc
0.04s 3.81% 91.43% 0.04s 3.81% runtime.nextFreeFast (inline)
[...]
$ go tool pprof -list bench.countBitsString bench.test cpu.prof
Total: 1.05s
ROUTINE ======================== stack/bench.countBitsString in /home/.../stack/bench/main.go
10ms 990ms (flat, cum) 94.29% of Total
. . 14: * implement a function that counts the number of set of bits in binary representation
. . 15: * Complete the 'countBits' function below.
. . 16: * The function is expected to return an int32.
. . 17: * The function accepts unit32 num as parameter.
. . 18: */
10ms 10ms 19:func countBitsString(num uint32) int32 {
. . 20:
. . 21: num_64 := int64(num)
. 470ms 22: num_2 := strconv.FormatInt(num_64, 2)
. 510ms 23: _, counter := recursiveCount(num_2, 0)
. . 24: return counter
. . 25:}
. . 26:
. . 27:func recursiveCount(s string, counter int32) (string, int32) {
. . 28: if len(s) < 1 {
These benchmarks show math.Log2
+math.Pow
(6.57s) is ~14x slower than strconv.FormatInt
(0.47s).
I also highly recommend using the pprof web interface to interactively explore performance. This is a little non-obvious, but can be started with:
$ go tool pprof -http : bench.test cpu.prof
Dave Cheney also has an instructive blog on benchmarking with Go:
https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论