Go CountBits练习:使用n个字符串切片执行比log2n的math包调用更好的性能。

huangapple go评论85阅读模式
英文:

Go CountBits exercise: n strings slices performing a lot better than log2n math package calls

问题

我正在尝试解决一个练习,其中需要计算任意数字的'1'位数。我想出了4个主要的思路:

  1. 创建一个递归函数countBitsString,在每次迭代中,它获取对应于二进制数的字符串,并删除第一个字符。如果这个字符是'1',则增加一个计数器并继续。
  2. 与上述相同,但是使用指向该字符串的指针而不是值(TODO)。
  3. 使用对数逻辑来理解这个数字中有多少个2的幂,对于每个幂,我们有一个'1'位,所以增加计数器。如果数字是奇数,则再增加一个(countBitsSmart)。
  4. 使用二进制逻辑来检查位,然后左移(TODO)。

现在,这是我的当前代码和结果:

package main

import (
	"fmt"
	"strconv"
	"math/rand"
	"math"
	"time"
)

/*
 * implement a function that counts the number of set of bits in binary representation
 * Complete the 'countBits' function below.
 * The function is expected to return an int32.
 * The function accepts unit32 num as parameter.
*/
func countBitsString(num uint32) int32 {

	num_64 	   := int64(num)
	num_2  	   := strconv.FormatInt(num_64, 2)
	_, counter := recursiveCount(num_2, 0)
	return counter
}

func recursiveCount(s string, counter int32) (string, int32) {
	if len(s) < 1 {
		return "", counter
		}
	
	if s[:1] == "1" {
		counter+=1
	}
	//fmt.Printf("Current counter: %d\n", counter)
	//fmt.Printf("Current char %s\n", s[:1])
	
	return recursiveCount(s[1:], counter)
}


func countBitsBinary(num uint32) int32 {
	fmt.Println("TODO")
	// convert uint23 in binary form
	// for binary_number > 0 
	//		if current bit == 1 => counter++
	//		shift left binary_number
	return 0
}

func countBitsSmart(num uint32) int32{
	var upBits 			int32
	var highestPower	uint32 = math.MaxUint32

	if num % 2 >0  {
		upBits+=1
		// fmt.Printf("\tUP bits = %d\n", upBits)
	}
																			
	for ; (num > 2 && highestPower > 1); upBits++ {							
		highestPower = uint32(math.Log2(float64(num)))						
		num          = num - uint32(math.Pow(2, float64(highestPower))) 	
		// fmt.Printf("\tlog2 = %d\n", highestPower)							
		// fmt.Printf("\tnum %d rest %d\n", num, num%2)
		// fmt.Printf("\tUP bits = %d\n", upBits)
	}
	
	return upBits
}

// Profiling with execution time and calling
func invoker(numInput []uint32, funcName string, countFunc func(num uint32) int32) {
	fmt.Println("\n=========================================================")
	start := time.Now()

	for _, numTemp := range numInput{
		// fmt.Printf("> Number:  %d (%s)\tUp bits: %d\n", numTemp, strconv.FormatInt(int64(numTemp), 2), countFunc( uint32(numTemp) ))
		countFunc( uint32(numTemp))
	}
	fmt.Printf("Time elapsed for %v func= %vs", funcName, time.Since(start))
}

func main() {
	const 	testSize = 10000000
	var 	numInput = make([]uint32, testSize)
	for i:=0; i<testSize; i++ {
		numInput[i] = uint32(rand.Intn(500))
	}
	
	fmt.Printf("\n Test size = %d\n", testSize)

	invoker(numInput, "countBitsString", countBitsString)
	
	invoker(numInput, "countBitsSmart", countBitsSmart)

	fmt.Println()
}

第二个函数为什么执行速度比前一个慢5倍?仅仅是因为调用了math包吗?

谢谢

英文:

I'm trying to solve an exercise where you have to count the number of '1' bits of any number.
I came up with 4 main ideas:

  1. Make a recursive function countBitsString, where at each iteration it gets the string corresponding to the base2 number removing the first character. If this is 1, increment a counter and continue.
  2. The same as above but with a pointer to that string instead of the value (TODO)
  3. Use logarithm logic to understand how many power of 2 are inside this number, for each of them we have a '1' bit up so increment the counter. Increment one more if the number is odd (countBitsSmart)
  4. Work with binary logic to check the bit and then shift left (TODO)

Now, here's my current code and later the results

package main

import (
	&quot;fmt&quot;
	&quot;strconv&quot;
	&quot;math/rand&quot;
	&quot;math&quot;
	&quot;time&quot;
)

/*
 * implement a function that counts the number of set of bits in binary representation
 * Complete the &#39;countBits&#39; function below.
 * The function is expected to return an int32.
 * The function accepts unit32 num as parameter.
*/
func countBitsString(num uint32) int32 {

	num_64 	   := int64(num)
	num_2  	   := strconv.FormatInt(num_64, 2)
	_, counter := recursiveCount(num_2, 0)
	return counter
}

func recursiveCount(s string, counter int32) (string, int32) {
	if len(s) &lt; 1 {
		return &quot;&quot;, counter
		}
	
	if s[:1] == &quot;1&quot; {
		counter+=1
	}
	//fmt.Printf(&quot;Current counter: %d\n&quot;, counter)
	//fmt.Printf(&quot;Current char %s\n&quot;, s[:1])
	
	return recursiveCount(s[1:], counter)
}


func countBitsBinary(num uint32) int32 {
	fmt.Println(&quot;TODO&quot;)
	// convert uint23 in binary form
	// for binary_number &gt; 0 
	//		if current bit == 1 =&gt; counter++
	//		shift left binary_number
	return 0
}

func countBitsSmart(num uint32) int32{
	var upBits 			int32
	var highestPower	uint32 = math.MaxUint32

	if num % 2 &gt;0  {
		upBits+=1
		// fmt.Printf(&quot;\tUP bits = %d\n&quot;, upBits)
	}
																			
	for ; (num &gt; 2 &amp;&amp; highestPower &gt; 1); upBits++ {							
		highestPower = uint32(math.Log2(float64(num)))						
		num          = num - uint32(math.Pow(2, float64(highestPower))) 	
		// fmt.Printf(&quot;\tlog2 = %d\n&quot;, highestPower)							
		// fmt.Printf(&quot;\tnum %d rest %d\n&quot;, num, num%2)
		// fmt.Printf(&quot;\tUP bits = %d\n&quot;, upBits)
	}
	
	return upBits
}

// Profiling with execution time and calling
func invoker(numInput []uint32, funcName string, countFunc func(num uint32) int32) {
	fmt.Println(&quot;\n=========================================================&quot;)
	start := time.Now()

	for _, numTemp := range numInput{
		// fmt.Printf(&quot;&gt; Number:  %d (%s)\tUp bits: %d\n&quot;, numTemp, strconv.FormatInt(int64(numTemp), 2), countFunc( uint32(numTemp) ))
		countFunc( uint32(numTemp))
	}
	fmt.Printf(&quot;Time elapsed for %v func= %vs&quot;, funcName, time.Since(start))
}

func main() {
	const 	testSize = 10000000
	var 	numInput = make([]uint32, testSize)
	for i:=0; i&lt;testSize; i++ {
		numInput[i] = uint32(rand.Intn(500))
	}
	
	fmt.Printf(&quot;\n Test size = %d\n&quot;, testSize)

	invoker(numInput, &quot;countBitsString&quot;, countBitsString)
	
	invoker(numInput, &quot;countBitsSmart&quot;, countBitsSmart)

	fmt.Println()
}

Go CountBits练习:使用n个字符串切片执行比log2n的math包调用更好的性能。

How can the second function perform 5 times worse than the previous one?
Is it just for the calls to math package?

Thanks

答案1

得分: 1

使用浮点数的Log2/Pow计算是一种非常昂贵的计算位数的方法。转换为二进制字符串要快得多,因为它可以通过位移/整数计算来完成(尽管仍然很昂贵)。维基百科的汉明重量页面上有一些通过位掩码进行非常快速的popcount计算的解释。

您可以通过Go的内置基准支持来查看相对性能。它可以帮助理解代码的性能:

// main_test.go
package main

import (
    "math/bits"
    "testing"
)   

// Simple benchmarks to get started.
// You could also try benchmarking different numbers to understand 
// performance differences. There is some inherit bias in benchmarking
// sequential numbers starting from 0. You could try a list of 256 preset
// random numbers. There are many options..

func BenchmarkCountBitsString(b *testing.B) {
    for i := 0; i < b.N; i++ {
        dummyInt32 = countBitsString(uint32(i))
    }
}

func BenchmarkCountBitsSmart(b *testing.B) {
    for i := 0; i < b.N; i++ {
        dummyInt32 = countBitsSmart(uint32(i))
    }
}

func BenchmarkOnesCount(b *testing.B) {
    for i := 0; i < b.N; i++ {
        dummyInt = bits.OnesCount32(uint32(i))
    }
}   

// Ensure some optimisations don't occur by assigning to a global.
var (
    dummyInt   int
    dummyInt32 int32
)   

然后,您可以轻松了解相对性能:

$ go test -bench .
goos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsString-8    10388419        99.97 ns/op
BenchmarkCountBitsSmart-8     2036588         617.6 ns/op
BenchmarkOnesCount-8          1000000000      0.5540 ns/op
PASS
ok   stack/bench 3.625s

使用pprof检查性能需要一些额外的参数:

  • 使用-run XXX禁用测试(XXX不匹配任何测试)
  • 限制为单个函数基准测试,否则更快的函数将运行更多次,直到它消耗与较慢函数相同的时间(在最后一次运行中约为5秒)。
  • 对基准时间使用固定的迭代次数,以便函数之间的配置文件可比较
  • 保存CPU配置文件。这也会导致测试二进制文件保留为DIR.test(在此示例中为bench.test)。

countBitsSmart的基准详细信息显示它花费了约55%的时间在math.Log2上,约38%的时间在math.Pow上:

$ go test -bench CountBitsSmart -run XXX -benchtime 10000000x -cpuprofile cpu.prof
goos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsSmart-8    10000000       703.0 ns/op
PASS
ok   stack/bench 7.226s
$ go tool pprof -top -cum bench.test cpu.prof 
File: bench.test
Type: cpu
Time: Apr 17, 2022 at 11:19pm (AEST)
Duration: 7.22s, Total samples = 7.01s (97.10%)
Showing nodes accounting for 7s, 99.86% of 7.01s total
Dropped 1 node (cum <= 0.04s)
flat  flat%   sum%        cum   cum%
0.01s  0.14%  0.14%      7.01s   100%  stack/bench.BenchmarkCountBitsSmart
0     0%  0.14%      7.01s   100%  testing.(*B).launch
0     0%  0.14%      7.01s   100%  testing.(*B).runN
0.43s  6.13%  6.28%         7s 99.86%  stack/bench.countBitsSmart
0.07s     1%  7.28%      3.88s 55.35%  math.Log2 (inline)
0.42s  5.99% 13.27%      3.81s 54.35%  math.log2
0.02s  0.29% 13.55%      3.07s 43.79%  math.Log (inline)
3.05s 43.51% 57.06%      3.05s 43.51%  math.archLog
0.15s  2.14% 59.20%      2.69s 38.37%  math.Pow (inline)
1.20s 17.12% 76.32%      2.54s 36.23%  math.pow
0.07s     1% 77.32%      0.57s  8.13%  math.Frexp (inline)
[...]
$ go tool pprof -list bench.countBitsSmart bench.test cpu.prof 
Total: 7.01s
ROUTINE ======================== stack/bench.countBitsSmart in /home/.../stack/bench/main.go
430ms         7s (flat, cum) 99.86% of Total
.          .     54:    if num%2 > 0 {
.          .     55:        upBits += 1
.          .     56:        // fmt.Printf("\tUP bits = %d\n", upBits)
.          .     57:    }
.          .     58:
40ms       40ms     59:    for ; num > 2 && highestPower > 1; upBits++ {
250ms      4.13s     60:        highestPower = uint32(math.Log2(float64(num)))
90ms      2.78s     61:        num = num - uint32(math.Pow(2, float64(highestPower)))
.          .     62:        // fmt.Printf("\tlog2 = %d\n", highestPower)
.          .     63:        // fmt.Printf("\tnum %d rest %d\n", num, num%2)
.          .     64:        // fmt.Printf("\tUP bits = %d\n", upBits)
.          .     65:    }
.          .     66:
50ms       50ms     67:    return upBits
.          .     68:}
.          .     69:
.          .     70:// Profiling with execution time and calling
.          .     71:func invoker(numInput []uint32, funcName string, countFunc func(num uint32) int32) {
.          .     72:    fmt.Println("\n=========================================================")

countBitsString基准测试显示约一半的时间花在recursiveCount上,另一半花在strconv.FormatInt上。这表明recursiveCount可能是优化的好目标。

$ go test -bench CountBitsString -run XXX -benchtime 10000000x -cpuprofile cpu.profgoos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsString-8    10000000        99.13 ns/op
PASS
ok   stack/bench 1.111s
$ go tool pprof -top -cum bench.test cpu.prof 
File: bench.test
Type: cpu
Time: Apr 17, 2022 at 11:15pm (AEST)
Duration: 1.10s, Total samples = 1.05s (95.18%)
Showing nodes accounting for 1.05s, 100% of 1.05s total
flat  flat%   sum%        cum   cum%
0.02s  1.90%  1.90%      1.01s 96.19%  stack/bench.BenchmarkCountBitsString
0     0%  1.90%      1.01s 96.19%  testing.(*B).launch
0     0%  1.90%      1.01s 96.19%  testing.(*B).runN
0.01s  0.95%  2.86%      0.99s 94.29%  stack/bench.countBitsString
0.51s 48.57% 51.43%      0.51s 48.57%  stack/bench.recursiveCount
0     0% 51.43%      0.47s 44.76%  strconv.FormatInt
0.21s 20.00% 71.43%      0.47s 44.76%  strconv.formatBits
0.03s  2.86% 74.29%      0.26s 24.76%  runtime.slicebytetostring
0.14s 13.33% 87.62%      0.22s 20.95%  runtime.mallocgc
0.04s  3.81% 91.43%      0.04s  3.81%  runtime.nextFreeFast (inline)
[...]
$ go tool pprof -list bench.countBitsString bench.test cpu.prof 
Total: 1.05s
ROUTINE ======================== stack/bench.countBitsString in /home/.../stack/bench/main.go
10ms      990ms (flat, cum) 94.29% of Total
.          .     14: * implement a function that counts the number of set of bits in binary representation
.          .     15: * Complete the 'countBits' function below.
.          .     16: * The function is expected to return an int32.
.          .     17: * The function accepts unit32 num as parameter.
.          .     18: */
10ms       10ms     19:func countBitsString(num uint32) int32 {
.          .     20:
.          .     21:    num_64 := int64(num)
.      470ms     22:    num_2 := strconv.FormatInt(num_64, 2)
.      510ms     23:    _, counter := recursiveCount(num_2, 0)
.          .     24:    return counter
.          .     25:}
.          .     26:
.          .     27:func recursiveCount(s string, counter int32) (string, int32) {
.          .     28:    if len(s) < 1 {

这些基准测试显示math.Log2+math.Pow(6.57s)比strconv.FormatInt(0.47s)慢约14倍。

我还强烈推荐使用pprof Web界面来交互式地探索性能。这有点不明显,但可以通过以下方式启动:

$ go tool pprof -http : bench.test cpu.prof

Dave Cheney还在他的博客中详细介绍了如何在Go中进行基准测试:
https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go

英文:

Using floating point Log2/Pow calculations is a very expensive way to calculate bits. Converting to a binary string is much faster since it can be done with bit shifts/integer calculations (it's still expensive). The Wikipedia Hamming Weight page has some explanations of very fast popcount calculations via bitmasks.

You can see the relative performance via Go's builtin benchmark support. It can help understand how code performs:

// main_test.go
package main

import (
    &quot;math/bits&quot;
    &quot;testing&quot;
)   

// Simple benchmarks to get started.
// You could also try benchmarking different numbers to understand 
// performance differences. There is some inherit bias in benchmarking
// sequential numbers starting from 0. You could try a list of 256 preset
// random numbers. There are many options..

func BenchmarkCountBitsString(b *testing.B) {
    for i := 0; i &lt; b.N; i++ {
        dummyInt32 = countBitsString(uint32(i))
    }
}

func BenchmarkCountBitsSmart(b *testing.B) {
    for i := 0; i &lt; b.N; i++ {
        dummyInt32 = countBitsSmart(uint32(i))
    }
}

func BenchmarkOnesCount(b *testing.B) {
    for i := 0; i &lt; b.N; i++ {
        dummyInt = bits.OnesCount32(uint32(i))
    }
}   

// Ensure some optimisations don&#39;t occur by assigning to a global.
var (
    dummyInt   int
    dummyInt32 int32
)   

Then you can easily understand the relative performance:

$ go test -bench .
goos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsString-8   	10388419	        99.97 ns/op
BenchmarkCountBitsSmart-8    	 2036588	       617.6 ns/op
BenchmarkOnesCount-8         	1000000000	         0.5540 ns/op
PASS
ok  	stack/bench	3.625s

Inspecting performance with pprof requires some extra parameters:

  • Disable tests with -run XXX (XXX doesn't match any tests)
  • Limit to a single function benchmark, otherwise the faster function will be run more often until it consumes as much time as the slower function (~5s in the final run).
  • Use a fixed number of iterations for the benchmark time so the profiles are comparable between functions
  • Save CPU profile. This also causes the test binary to be kept as DIR.test (bench.test in this example).

Benchmark details for countBitsSmart show it spends ~55% time in math.Log2 and ~38% time in math.Pow:

$ go test -bench CountBitsSmart -run XXX -benchtime 10000000x -cpuprofile cpu.prof
goos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsSmart-8   	10000000	       703.0 ns/op
PASS
ok  	stack/bench	7.226s
$ go tool pprof -top -cum bench.test cpu.prof 
File: bench.test
Type: cpu
Time: Apr 17, 2022 at 11:19pm (AEST)
Duration: 7.22s, Total samples = 7.01s (97.10%)
Showing nodes accounting for 7s, 99.86% of 7.01s total
Dropped 1 node (cum &lt;= 0.04s)
flat  flat%   sum%        cum   cum%
0.01s  0.14%  0.14%      7.01s   100%  stack/bench.BenchmarkCountBitsSmart
0     0%  0.14%      7.01s   100%  testing.(*B).launch
0     0%  0.14%      7.01s   100%  testing.(*B).runN
0.43s  6.13%  6.28%         7s 99.86%  stack/bench.countBitsSmart
0.07s     1%  7.28%      3.88s 55.35%  math.Log2 (inline)
0.42s  5.99% 13.27%      3.81s 54.35%  math.log2
0.02s  0.29% 13.55%      3.07s 43.79%  math.Log (inline)
3.05s 43.51% 57.06%      3.05s 43.51%  math.archLog
0.15s  2.14% 59.20%      2.69s 38.37%  math.Pow (inline)
1.20s 17.12% 76.32%      2.54s 36.23%  math.pow
0.07s     1% 77.32%      0.57s  8.13%  math.Frexp (inline)
[...]
$ go tool pprof -list bench.countBitsSmart bench.test cpu.prof 
Total: 7.01s
ROUTINE ======================== stack/bench.countBitsSmart in /home/.../stack/bench/main.go
430ms         7s (flat, cum) 99.86% of Total
.          .     54:	if num%2 &gt; 0 {
.          .     55:		upBits += 1
.          .     56:		// fmt.Printf(&quot;\tUP bits = %d\n&quot;, upBits)
.          .     57:	}
.          .     58:
40ms       40ms     59:	for ; num &gt; 2 &amp;&amp; highestPower &gt; 1; upBits++ {
250ms      4.13s     60:		highestPower = uint32(math.Log2(float64(num)))
90ms      2.78s     61:		num = num - uint32(math.Pow(2, float64(highestPower)))
.          .     62:		// fmt.Printf(&quot;\tlog2 = %d\n&quot;, highestPower)
.          .     63:		// fmt.Printf(&quot;\tnum %d rest %d\n&quot;, num, num%2)
.          .     64:		// fmt.Printf(&quot;\tUP bits = %d\n&quot;, upBits)
.          .     65:	}
.          .     66:
50ms       50ms     67:	return upBits
.          .     68:}
.          .     69:
.          .     70:// Profiling with execution time and calling
.          .     71:func invoker(numInput []uint32, funcName string, countFunc func(num uint32) int32) {
.          .     72:	fmt.Println(&quot;\n=========================================================&quot;)

The countBitsString benchmark shows about half the time is spent in recursiveCount and half in strconv.FormatInt. This indicates recursiveCount could be a good target for optimisation.

$ go test -bench CountBitsString -run XXX -benchtime 10000000x -cpuprofile cpu.profgoos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsString-8   	10000000	        99.13 ns/op
PASS
ok  	stack/bench	1.111s
$ go tool pprof -top -cum bench.test cpu.prof 
File: bench.test
Type: cpu
Time: Apr 17, 2022 at 11:15pm (AEST)
Duration: 1.10s, Total samples = 1.05s (95.18%)
Showing nodes accounting for 1.05s, 100% of 1.05s total
flat  flat%   sum%        cum   cum%
0.02s  1.90%  1.90%      1.01s 96.19%  stack/bench.BenchmarkCountBitsString
0     0%  1.90%      1.01s 96.19%  testing.(*B).launch
0     0%  1.90%      1.01s 96.19%  testing.(*B).runN
0.01s  0.95%  2.86%      0.99s 94.29%  stack/bench.countBitsString
0.51s 48.57% 51.43%      0.51s 48.57%  stack/bench.recursiveCount
0     0% 51.43%      0.47s 44.76%  strconv.FormatInt
0.21s 20.00% 71.43%      0.47s 44.76%  strconv.formatBits
0.03s  2.86% 74.29%      0.26s 24.76%  runtime.slicebytetostring
0.14s 13.33% 87.62%      0.22s 20.95%  runtime.mallocgc
0.04s  3.81% 91.43%      0.04s  3.81%  runtime.nextFreeFast (inline)
[...]
$ go tool pprof -list bench.countBitsString bench.test cpu.prof 
Total: 1.05s
ROUTINE ======================== stack/bench.countBitsString in /home/.../stack/bench/main.go
10ms      990ms (flat, cum) 94.29% of Total
.          .     14: * implement a function that counts the number of set of bits in binary representation
.          .     15: * Complete the &#39;countBits&#39; function below.
.          .     16: * The function is expected to return an int32.
.          .     17: * The function accepts unit32 num as parameter.
.          .     18: */
10ms       10ms     19:func countBitsString(num uint32) int32 {
.          .     20:
.          .     21:	num_64 := int64(num)
.      470ms     22:	num_2 := strconv.FormatInt(num_64, 2)
.      510ms     23:	_, counter := recursiveCount(num_2, 0)
.          .     24:	return counter
.          .     25:}
.          .     26:
.          .     27:func recursiveCount(s string, counter int32) (string, int32) {
.          .     28:	if len(s) &lt; 1 {

These benchmarks show math.Log2+math.Pow (6.57s) is ~14x slower than strconv.FormatInt (0.47s).

I also highly recommend using the pprof web interface to interactively explore performance. This is a little non-obvious, but can be started with:

$ go tool pprof -http : bench.test cpu.prof

Dave Cheney also has an instructive blog on benchmarking with Go:
https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go

huangapple
  • 本文由 发表于 2022年4月15日 23:14:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/71885769.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定