Best way of parsing date and time in golang

huangapple go评论86阅读模式
英文:

Best way of parsing date and time in golang

问题

我有很多日期时间值以字符串形式传入我的 Golang 程序。
格式固定为以下数字:

2006/01/02 15:04:05

我开始使用 time.Parse 函数解析这些日期。

const dtFormat = "2006/01/02 15:04:05"

func ParseDate1(strdate string) (time.Time, error) {
    return time.Parse(dtFormat, strdate)
}

但是我在程序中遇到了一些性能问题。因此,我尝试通过编写自己的解析函数来优化它,考虑到我的格式是固定的:

func ParseDate2(strdate string) (time.Time, error) {
    year, _ := strconv.Atoi(strdate[:4])
    month, _ := strconv.Atoi(strdate[5:7])
    day, _ := strconv.Atoi(strdate[8:10])
    hour, _ := strconv.Atoi(strdate[11:13])
    minute, _ := strconv.Atoi(strdate[14:16])
    second, _ := strconv.Atoi(strdate[17:19])

    return time.Date(year, time.Month(month), day, hour, minute, second, 0, time.UTC), nil
}

最后,我对这两个函数进行了基准测试,并得到了以下结果:

 BenchmarkParseDate1      5000000               343 ns/op
 BenchmarkParseDate2     10000000               248 ns/op

这是性能提升了27%。在性能方面是否有更好的方法可以改进这种日期时间解析?

英文:

I have a lot of datetime values incoming as string into my golang program.
The format is fixed in number of digit:

2006/01/02 15:04:05

I started to parse these dates with the time.Parse function

const dtFormat = "2006/01/02 15:04:05"

func ParseDate1(strdate string) (time.Time, error) {
    return time.Parse(dtFormat, strdate)
}

but I had some performances issue with my program. Thus I tried to tune it by writting my own parsing function, taking into account that my format is kind of fixed:

func ParseDate2(strdate string) (time.Time, error) {
    year, _ := strconv.Atoi(strdate[:4])
    month, _ := strconv.Atoi(strdate[5:7])
    day, _ := strconv.Atoi(strdate[8:10])
    hour, _ := strconv.Atoi(strdate[11:13])
    minute, _ := strconv.Atoi(strdate[14:16])
    second, _ := strconv.Atoi(strdate[17:19])

    return time.Date(year, time.Month(month), day, hour, minute, second, 0, time.UTC), nil
}

finally I did a benchmark on top of these 2 functions and got the following result:

 BenchmarkParseDate1      5000000               343 ns/op
 BenchmarkParseDate2     10000000               248 ns/op

This is a performance improvement by 27%.
Is there a better way in terms of performances that could improve such datetime parsing ?

答案1

得分: 7

从你已经展示的内容来看,直接使用strconv.Atoi可以提高性能。你可以进一步优化,为你特定的用例编写自己的atoi函数。

你期望每个项目都是一个正的十进制数。你也知道它不会溢出,因为传递的字符串表示的最大长度为4。唯一可能出错的情况是字符串中包含非数字字符。有了这个信息,我们可以简单地做以下操作:

var atoiError = errors.New("invalid number")
func atoi(s string) (x int, err error) {
    i := 0
    for ; i < len(s); i++ {
        c := s[i]
        if c < '0' || c > '9' {
            err = atoiError
            return
        }
        x = x*10 + int(c) - '0'
    }
    return
}

将其包装到ParseDate3中,我得到以下结果:

BenchmarkParseDate1     5000000       355 ns/op
BenchmarkParseDate2    10000000       278 ns/op
BenchmarkParseDate3    20000000        88 ns/op

你可以通过在atoi中不返回错误来使其更快,但我鼓励你仍然测试输入(除非在代码的其他地方已经验证过)。

在看到内联解决方案后的替代atoi方法:

进一步优化,你可以利用传递的字符串除了年份是4位数外都是2位数的事实。创建一个接受2位数字符串的atoi函数将消除for循环。例如:

// 将2个字符的字符串转换为正整数,出错时返回-1
func atoi2(s string) int {
    x := uint(s[0]) - uint('0')
    y := uint(s[1]) - uint('0')
    if x > 9 || y > 9 {
        return -1 // 出错
    }
    return int(x*10 + y)
}

然后将年份转换为数字需要两步:

year := atoi2(strdate[0:2])*100 + atoi2(strdate[2:4])

这将带来额外的改进:

BenchmarkParseDate4 50000000            61 ns/op

请注意,@peterSO提出的内联版本只稍微更快(在我的情况下为54 ns/op),但上述解决方案可以进行错误检查,而内联版本将盲目地将所有字符转换为日期。

英文:

From what you have already showed, using strconv.Atoi directly improved your performance. You can push it further and roll your own atoi for your particular use case.

You expect each item to be a positive base-10 number. You also know it can't overflow, because max length of string representation passed is 4. The only error possible is then a non-digit character in the string. Knowing this, we can simply do the following:

var atoiError = errors.New(&quot;invalid number&quot;)
func atoi(s string) (x int, err error) {
    i := 0
    for ; i &lt; len(s); i++ {
        c := s[i]
        if c &lt; &#39;0&#39; || c &gt; &#39;9&#39; {
            err = atoiError
            return
        }
        x = x*10 + int(c) - &#39;0&#39;
    }
    return
}

Wrapping this into ParseDate3, I have the following result:

BenchmarkParseDate1	 5000000	       355 ns/op
BenchmarkParseDate2	10000000	       278 ns/op
BenchmarkParseDate3	20000000	        88 ns/op

You could make it faster by not returning an error in atoi, but I encourage you to test the input anyway (unless it's validated somewhere else in your code).

Alternative atoi approach after seeing the inlined solution:

Pushing this even further, you could take advantage of the fact that all but one of passed strings are 2-digit long (year is 4-digit, but it's multiply of two). Creating atoi taking 2-digit string would eliminate the for loop. Example:

// Converts string of 2 characters into a positive integer, returns -1 on error
func atoi2(s string) int {
	x := uint(s[0]) - uint(&#39;0&#39;)
	y := uint(s[1]) - uint(&#39;0&#39;)
	if x &gt; 9 || y &gt; 9 {
		return -1 // error
	}
	return int(x*10 + y)
}

Converting year into the number would need 2-step approach then:

year := atoi2(strdate[0:2])*100 + atoi2(strdate[2:4])

This gives additional improvement:

BenchmarkParseDate4 50000000            61 ns/op

Note that inlined version proposed by @peterSO is only slightly faster (54 ns/op in my case), but the solution above gives you possibility of error checking, while the inlined version would blindly take all the characters converting them into dates.

答案2

得分: 5

我会期望使你的整个程序运行速度更快。例如,ParseDate3函数的代码如下:

func ParseDate3(date []byte) (time.Time, error) {
    year := (((int(date[0])-'0')*10+int(date[1])-'0')*10+int(date[2])-'0')*10 + int(date[3]) - '0'
    month := time.Month((int(date[5])-'0')*10 + int(date[6]) - '0')
    day := (int(date[8])-'0')*10 + int(date[9]) - '0'
    hour := (int(date[11])-'0')*10 + int(date[12]) - '0'
    minute := (int(date[14])-'0')*10 + int(date[15]) - '0'
    second := (int(date[17])-'0')*10 + int(date[18]) - '0'
    return time.Date(year, month, day, hour, minute, second, 0, time.UTC), nil
}

基准测试结果如下:

$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkParseDate1     5000000         308 ns/op
BenchmarkParseDate2    10000000         225 ns/op
BenchmarkParseDate3    30000000         44.9 ns/op
ok   so/test    5.741s
$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkParseDate1     5000000         308 ns/op
BenchmarkParseDate2    10000000         226 ns/op
BenchmarkParseDate3    30000000         45.4 ns/op
ok   so/test    5.757s
$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkParseDate1     5000000         312 ns/op
BenchmarkParseDate2    10000000         225 ns/op
BenchmarkParseDate3    30000000         45.0 ns/op
ok   so/test    5.761s

参考资料:

Profiling Go Programs


如果你坚持使用date string,可以使用ParseDate4函数:

func ParseDate4(date string) (time.Time, error) {
    year := (((int(date[0])-'0')*10+int(date[1])-'0')*10+int(date[2])-'0')*10 + int(date[3]) - '0'
    month := time.Month((int(date[5])-'0')*10 + int(date[6]) - '0')
    day := (int(date[8])-'0')*10 + int(date[9]) - '0'
    hour := (int(date[11])-'0')*10 + int(date[12]) - '0'
    minute := (int(date[14])-'0')*10 + int(date[15]) - '0'
    second := (int(date[17])-'0')*10 + int(date[18]) - '0'
    return time.Date(year, month, day, hour, minute, second, 0, time.UTC), nil
}
英文:

I would expect to make your entire program much faster. For example, ParseDate3,

func ParseDate3(date []byte) (time.Time, error) {
	year := (((int(date[0])-&#39;0&#39;)*10+int(date[1])-&#39;0&#39;)*10+int(date[2])-&#39;0&#39;)*10 + int(date[3]) - &#39;0&#39;
	month := time.Month((int(date[5])-&#39;0&#39;)*10 + int(date[6]) - &#39;0&#39;)
	day := (int(date[8])-&#39;0&#39;)*10 + int(date[9]) - &#39;0&#39;
	hour := (int(date[11])-&#39;0&#39;)*10 + int(date[12]) - &#39;0&#39;
	minute := (int(date[14])-&#39;0&#39;)*10 + int(date[15]) - &#39;0&#39;
	second := (int(date[17])-&#39;0&#39;)*10 + int(date[18]) - &#39;0&#39;
	return time.Date(year, month, day, hour, minute, second, 0, time.UTC), nil
}

Benchmarks:

$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkParseDate1	 5000000	       308 ns/op
BenchmarkParseDate2	10000000	       225 ns/op
BenchmarkParseDate3	30000000	        44.9 ns/op
ok  	so/test	5.741s
$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkParseDate1	 5000000	       308 ns/op
BenchmarkParseDate2	10000000	       226 ns/op
BenchmarkParseDate3	30000000	        45.4 ns/op
ok  	so/test	5.757s
$ go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkParseDate1	 5000000	       312 ns/op
BenchmarkParseDate2	10000000	       225 ns/op
BenchmarkParseDate3	30000000	        45.0 ns/op
ok  	so/test	5.761s
$ 

Reference:

Profiling Go Programs


If you insist on using date string, use ParseDate4,

func ParseDate4(date string) (time.Time, error) {
	year := (((int(date[0])-&#39;0&#39;)*10+int(date[1])-&#39;0&#39;)*10+int(date[2])-&#39;0&#39;)*10 + int(date[3]) - &#39;0&#39;
	month := time.Month((int(date[5])-&#39;0&#39;)*10 + int(date[6]) - &#39;0&#39;)
	day := (int(date[8])-&#39;0&#39;)*10 + int(date[9]) - &#39;0&#39;
	hour := (int(date[11])-&#39;0&#39;)*10 + int(date[12]) - &#39;0&#39;
	minute := (int(date[14])-&#39;0&#39;)*10 + int(date[15]) - &#39;0&#39;
	second := (int(date[17])-&#39;0&#39;)*10 + int(date[18]) - &#39;0&#39;
	return time.Date(year, month, day, hour, minute, second, 0, time.UTC), nil
}

huangapple
  • 本文由 发表于 2014年12月1日 02:09:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/27216457.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定