2017年8月19日 04:14:08go评论96阅读模式

英文:

Most efficient way to check if a byte slice is a number

问题

我正在寻找判断字节切片是否为浮点数的最高效方法。

由于要处理大型数据集，因此性能至关重要。

尝试过的方法有：

strconv.ParseFloat
regexp.Match
CheckNumber - 使用IsNumber函数和检查字节切片是否包含.的自定义函数。

func CheckNumber(p []byte) bool {
    r := string(p)
    sep := 0
    for _, b := range r {
        if unicode.IsNumber(b) {
            continue
        }
        if b == rune('.') {
            if sep > 0 {
                return false
            }
            sep++
            continue
        }
        return false
    }
    return true
}

基准测试代码：

func BenchmarkFloatStrconv(b *testing.B) {
    p := []byte("15.34234234234")

    for i := 0; i < b.N; i++ {
        _, err := strconv.ParseFloat(string(p), 64)
        if err != nil {
            log.Fatalf("NaN")
        }
    }
}

func BenchmarkFloatRegex(b *testing.B) {
    p := []byte("15.34234234234")
    r := `[-+]?[0-9]*\.?[0-9]`
    c, _ := regexp.Compile(r)

    for i := 0; i < b.N; i++ {
        ok := c.Match(p)
        if !ok {
            log.Fatalf("NaN")
        }
    }
}

func BenchmarkCheckNumber(b *testing.B) {
    p := []byte("15.34234234234")

    for i := 0; i < b.N; i++ {
        ok := CheckNumber(p)
        if !ok {
            log.Fatalf("NaN")
        }
    }
}

基准测试结果：

BenchmarkFloatStrconv-8   	20000000	        85.8 ns/op	      16 B/op	       1 allocs/op
BenchmarkFloatRegex-8     	 5000000	       252 ns/op	       0 B/op	       0 allocs/op
BenchmarkCheckNumber-8    	20000000	        64.3 ns/op	       0 B/op	       0 allocs/op

我对这些不同的解决方案进行了公平的比较吗？
是否有更好的解决方案？

编辑： 在Adrian和icza的指导下，这个新方法避免了转换为strings/runes的过程。

func CheckNumberNoStringConvert(r []byte) bool {
    sep := 0

    for i := range r {
        if r[i] >= 48 && r[i] <= 57 {
            continue
        }
        if r[i] == 46 {
            if sep > 0 {
                return false
            }
            sep++
            continue
        }
        return false
    }

    return true
}

并且性能表现非常好

BenchmarkCheckNumberNoStringConvert-8   	200000000	         8.55 ns/op	       0 B/op	       0 allocs/op

英文:

I'm looking for the most efficient way to tell whether a byte slice is a float.

This is to be done on huge datasets, so performance is key.

Tried approaches:

strconv.ParseFloat
regexp.Match

CheckNumber - home rolled function using IsNumber + looking at whether the byte slice contains a ..

  func CheckNumber(p []byte) bool {
  	r := string(p)
  	sep := 0
  	for _, b := range r {
  		if unicode.IsNumber(b) {
  			continue
  		}
  		if b == rune(&#39;.&#39;) {
  			if sep &gt; 0 {
  				return false
  			}
  			sep++
  			continue
  		}
  		return false
  	}
  	return true
  }

The benchmark code:

func BenchmarkFloatStrconv(b *testing.B) {
	p := []byte(&quot;15.34234234234&quot;)

	for i := 0; i &lt; b.N; i++ {
		_, err := strconv.ParseFloat(string(p), 64)
		if err != nil {
			log.Fatalf(&quot;NaN&quot;)
		}
	}
}

func BenchmarkFloatRegex(b *testing.B) {
	p := []byte(&quot;15.34234234234&quot;)
	r := `[-+]?[0-9]*\.?[0-9]`
	c, _ := regexp.Compile(r)

	for i := 0; i &lt; b.N; i++ {
		ok := c.Match(p)
		if !ok {
			log.Fatalf(&quot;NaN&quot;)
		}
	}
}

func BenchmarkCheckNumber(b *testing.B) {
	p := []byte(&quot;15.34234234234&quot;)

	for i := 0; i &lt; b.N; i++ {
		ok := CheckNumber(p)
		if !ok {
			log.Fatalf(&quot;NaN&quot;)
		}
	}
}

Benchmark results:

BenchmarkFloatStrconv-8   	20000000	        85.8 ns/op	      16 B/op	       1 allocs/op
BenchmarkFloatRegex-8     	 5000000	       252 ns/op	       0 B/op	       0 allocs/op
BenchmarkCheckNumber-8    	20000000	        64.3 ns/op	       0 B/op	       0 allocs/op

Am I doing the different solutions fairness?
Are there better solutions?

Edit: thanks to pointers from Adrian and icza, this avoids converting to strings/runes

func CheckNumberNoStringConvert(r []byte) bool {
	sep := 0

	for i := range r {
		if r[i] &gt;= 48 &amp;&amp; r[i] &lt;= 57 {
			continue
		}
		if r[i] == 46 {
			if sep &gt; 0 {
				return false
			}
			sep++
			continue
		}
		return false
	}

	return true
}

and performs quite well

BenchmarkCheckNumberNoStringConvert-8   	200000000	         8.55 ns/op	       0 B/op	       0 allocs/op

答案1

得分: 3

对于一个简单的实数（浮点数）（没有科学或工程浮点格式，没有分组分隔符），可以使用以下代码进行判断：

func IsReal(n []byte) bool {
    if len(n) > 0 && n[0] == '-' {
        n = n[1:]
    }
    if len(n) == 0 {
        return false
    }
    var point bool
    for _, c := range n {
        if '0' <= c && c <= '9' {
            continue
        }
        if c == '.' && len(n) > 1 && !point {
            point = true
            continue
        }
        return false
    }
    return true
}

基准测试结果如下：

$ go test -run=! -bench=. -benchmem -cpu=1 real_test.go
goos: linux
goarch: amd64
BenchmarkIsReal       	100000000	    20.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkFloatStrconv 	20000000	   101 ns/op	      16 B/op	       1 allocs/op
BenchmarkFloatRegex   	 5000000	   284 ns/op	       0 B/op	       0 allocs/op
BenchmarkCheckNumber  	20000000	    73.0 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	command-line-arguments	7.380s

real_test.go 文件内容如下：

package main

import (
    "log"
    "regexp"
    "strconv"
    "testing"
    "unicode"
)

func IsReal(n []byte) bool {
    if len(n) > 0 && n[0] == '-' {
        n = n[1:]
    }
    if len(n) == 0 {
        return false
    }
    var point bool
    for _, c := range n {
        if '0' <= c && c <= '9' {
            continue
        }
        if c == '.' && len(n) > 1 && !point {
            point = true
            continue
        }
        return false
    }
    return true
}

func BenchmarkIsReal(b *testing.B) {
    p := []byte("15.34234234234")
    for i := 0; i < b.N; i++ {
        ok := IsReal(p)
        if !ok {
            log.Fatalf("NaN")
        }
    }
}

func CheckNumber(p []byte) bool {
    r := string(p)

    sep := 0

    for _, b := range r {
        if unicode.IsNumber(b) {
            continue
        }
        if b == rune('.') {
            if sep > 0 {
                return false
            }
            sep++
            continue
        }
        return false
    }

    return true

}

func BenchmarkFloatStrconv(b *testing.B) {
    p := []byte("15.34234234234")

    for i := 0; i < b.N; i++ {
        _, err := strconv.ParseFloat(string(p), 64)
        if err != nil {
            log.Fatalf("NaN")
        }
    }
}

func BenchmarkFloatRegex(b *testing.B) {
    p := []byte("15.34234234234")
    r := `[-+]?[0-9]*\.?[0-9]`
    c, _ := regexp.Compile(r)

    for i := 0; i < b.N; i++ {
        ok := c.Match(p)
        if !ok {
            log.Fatalf("NaN")
        }
    }
}

func BenchmarkCheckNumber(b *testing.B) {
    p := []byte("15.34234234234")

    for i := 0; i < b.N; i++ {
        ok := CheckNumber(p)
        if !ok {
            log.Fatalf("NaN")
        }
    }
}

希望对你有帮助！

英文:

For a simple real (floating-point) number (no scientific or engineering floating-point format, no group separators),

func IsReal(n []byte) bool {
if len(n) &gt; 0 &amp;&amp; n[0] == &#39;-&#39; {
n = n[1:]
}
if len(n) == 0 {
return false
}
var point bool
for _, c := range n {
if &#39;0&#39; &lt;= c &amp;&amp; c &lt;= &#39;9&#39; {
continue
}
if c == &#39;.&#39; &amp;&amp; len(n) &gt; 1 &amp;&amp; !point {
point = true
continue
}
return false
}
return true
}

Benchmark:

$ go test -run=! -bench=. -benchmem -cpu=1 real_test.go
goos: linux
goarch: amd64
BenchmarkIsReal       	100000000	    20.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkFloatStrconv 	20000000	   101 ns/op	      16 B/op	       1 allocs/op
BenchmarkFloatRegex   	 5000000	   284 ns/op	       0 B/op	       0 allocs/op
BenchmarkCheckNumber  	20000000	    73.0 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	command-line-arguments	7.380s

real_test.go:

package main
import (
&quot;log&quot;
&quot;regexp&quot;
&quot;strconv&quot;
&quot;testing&quot;
&quot;unicode&quot;
)
func IsReal(n []byte) bool {
if len(n) &gt; 0 &amp;&amp; n[0] == &#39;-&#39; {
n = n[1:]
}
if len(n) == 0 {
return false
}
var point bool
for _, c := range n {
if &#39;0&#39; &lt;= c &amp;&amp; c &lt;= &#39;9&#39; {
continue
}
if c == &#39;.&#39; &amp;&amp; len(n) &gt; 1 &amp;&amp; !point {
point = true
continue
}
return false
}
return true
}
func BenchmarkIsReal(b *testing.B) {
p := []byte(&quot;15.34234234234&quot;)
for i := 0; i &lt; b.N; i++ {
ok := IsReal(p)
if !ok {
log.Fatalf(&quot;NaN&quot;)
}
}
}
func CheckNumber(p []byte) bool {
r := string(p)
sep := 0
for _, b := range r {
if unicode.IsNumber(b) {
continue
}
if b == rune(&#39;.&#39;) {
if sep &gt; 0 {
return false
}
sep++
continue
}
return false
}
return true
}
func BenchmarkFloatStrconv(b *testing.B) {
p := []byte(&quot;15.34234234234&quot;)
for i := 0; i &lt; b.N; i++ {
_, err := strconv.ParseFloat(string(p), 64)
if err != nil {
log.Fatalf(&quot;NaN&quot;)
}
}
}
func BenchmarkFloatRegex(b *testing.B) {
p := []byte(&quot;15.34234234234&quot;)
r := `[-+]?[0-9]*\.?[0-9]`
c, _ := regexp.Compile(r)
for i := 0; i &lt; b.N; i++ {
ok := c.Match(p)
if !ok {
log.Fatalf(&quot;NaN&quot;)
}
}
}
func BenchmarkCheckNumber(b *testing.B) {
p := []byte(&quot;15.34234234234&quot;)
for i := 0; i &lt; b.N; i++ {
ok := CheckNumber(p)
if !ok {
log.Fatalf(&quot;NaN&quot;)
}
}
}

答案2

得分: 1

我将其作为一种挑战，尝试将其重写为一种状态机，从这里的每个人的集体输入中合成

func Validate(b []byte) bool {
    for i := range b {
        switch {
        case b[i] >= '0' && b[i] <= '9':
            continue
        case b[i] == '.':
            if len(b) == 1 {
                return false
            }
            if len(b) > i {
                return fractional(b[i+1:])
            }
            return true
        case i == 0 && b[i] == '-':
            if len(b) == 1 {
                return false
            }
            continue
        default:
            return false
        }
    }

    return true
}

func fractional(b []byte) bool {
    for i := range b {
        switch {
        case b[i] >= '0' && b[i] <= '9':
            continue
        case b[i] == 'e' || b[i] == 'E':
            if len(b[:i]) == 0 {
                return false
            }
            if len(b) > i+1 {
                return scientific(b[i+1:])
            }
            return false
        default:
            return false
        }
    }

    return true
}

func scientific(b []byte) bool {
    for i := range b {
        switch {
        case b[i] >= '0' && b[i] <= '9':
            continue
        case i == 0 && b[i] == '-':
            if len(b) == 1 {
                return false
            }
            continue
        default:
            return false
        }
    }

    return true
}

它似乎适用于几种不同的数字格式：

type v struct {
    Input    []byte
    Expected bool
}

func TestPermutations(t *testing.T) {
    b := []v{
        v{[]byte("123.456"), true},
        v{[]byte("123"), true},
        v{[]byte("123."), true},
        v{[]byte(".123"), true},
        v{[]byte("12.1e12"), true},
        v{[]byte("12.1e-12"), true},
        v{[]byte("-123.456"), true},
        v{[]byte("-123"), true},
        v{[]byte("-123."), true},
        v{[]byte("-.123"), true},
        v{[]byte("-12.1e12"), true},
        v{[]byte("-12.1e-12"), true},
        v{[]byte(".1e-12"), true},
        v{[]byte(".e-12"), false},
        v{[]byte(".e"), false},
        v{[]byte("e"), false},
        v{[]byte("abcdef"), false},
        v{[]byte("-"), false},
        v{[]byte("."), false},
    }

    for _, test := range b {
        ok := Validate(test.Input)
        if ok != test.Expected {
            t.Errorf("无法处理案例 %s", test.Input)
        }
    }
}

并且在原始基准测试中表现良好：

BenchmarkValidate-8    100000000    13.0 ns/op    0 B/op    0 allocs/op

基准测试代码：

func BenchmarkValidate(b *testing.B) {
    p := []byte("15.1234567890")

    for i := 0; i < b.N; i++ {
        ok := Validate(p)
        if !ok {
            log.Fatalf("问题")
        }
    }
}

英文:

I took upon it as a challenge for myself to rewrite this as some kind of state machine synthesizing the collective input from everyone here

func Validate(b []byte) bool {
for i := range b {
switch {
case b[i] &gt;= &#39;0&#39; &amp;&amp; b[i] &lt;= &#39;9&#39;:
continue
case b[i] == &#39;.&#39;:
if len(b) == 1 {
return false
}
if len(b) &gt; i {
return fractional(b[i+1:])
}
return true
case i == 0 &amp;&amp; b[i] == &#39;-&#39;:
if len(b) == 1 {
return false
}
continue
default:
return false
}
}
return true
}
func fractional(b []byte) bool {
for i := range b {
switch {
case b[i] &gt;= &#39;0&#39; &amp;&amp; b[i] &lt;= &#39;9&#39;:
continue
case b[i] == &#39;e&#39; || b[i] == &#39;E&#39;:
if len(b[:i]) == 0 {
return false
}
if len(b) &gt; i+1 {
return scientific(b[i+1:])
}
return false
default:
return false
}
}
return true
}
func scientific(b []byte) bool {
for i := range b {
switch {
case b[i] &gt;= &#39;0&#39; &amp;&amp; b[i] &lt;= &#39;9&#39;:
continue
case i == 0 &amp;&amp; b[i] == &#39;-&#39;:
if len(b) == 1 {
return false
}
continue
default:
return false
}
}
return true
}

It seems to work on a few different number formats:

type v struct {
Input    []byte
Expected bool
}
func TestPermutations(t *testing.T) {
b := []v{
v{[]byte(&quot;123.456&quot;), true},
v{[]byte(&quot;123&quot;), true},
v{[]byte(&quot;123.&quot;), true},
v{[]byte(&quot;.123&quot;), true},
v{[]byte(&quot;12.1e12&quot;), true},
v{[]byte(&quot;12.1e-12&quot;), true},
v{[]byte(&quot;-123.456&quot;), true},
v{[]byte(&quot;-123&quot;), true},
v{[]byte(&quot;-123.&quot;), true},
v{[]byte(&quot;-.123&quot;), true},
v{[]byte(&quot;-12.1e12&quot;), true},
v{[]byte(&quot;-12.1e-12&quot;), true},
v{[]byte(&quot;.1e-12&quot;), true},
v{[]byte(&quot;.e-12&quot;), false},
v{[]byte(&quot;.e&quot;), false},
v{[]byte(&quot;e&quot;), false},
v{[]byte(&quot;abcdef&quot;), false},
v{[]byte(&quot;-&quot;), false},
v{[]byte(&quot;.&quot;), false},
}
for _, test := range b {
ok := Validate(test.Input)
if ok != test.Expected {
t.Errorf(&quot;could not handle case %s&quot;, test.Input)
}
}
}

and perform quite well on the original benchmark:

BenchmarkValidate-8   	100000000	        13.0 ns/op	       0 B/op	       0 allocs/op

Benchmark code:

func BenchmarkValidate(b *testing.B) {
p := []byte(&quot;15.1234567890&quot;)
for i := 0; i &lt; b.N; i++ {
ok := Validate(p)
if !ok {
log.Fatalf(&quot;problem&quot;)
}
}
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

检查字节切片是否为数字的最有效方法

问题

答案1

答案2

Go提供REPL吗？

How to store a value of type context.CancelFunc using context.WithValue()?

嵌入Flutter的Golang服务器

谁在使用我的Go包？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论