解析用户代理在GO(Golang)中的实现 – Tobie/ua-parser

huangapple go评论82阅读模式
英文:

Parsing User Agents in GO (Golang) - Tobie/ua-parser

问题

我正在尝试通过一个GO(Golang)程序来流式传输大量的用户代理,以提取关于这些用户代理的不同信息,如设备类型、操作系统等。

Tobie Langel的UA Parser Repo中的GO代码看起来非常有前途:

https://github.com/tobie/ua-parser/tree/master/go/uaparser

我创建了一个简单的程序,在其中基本上为README页面上的示例添加了流式传输功能。为了比较性能,我创建了一个使用类似方法和相同regexes.yaml文件的Ruby gem的相同类型的简单程序。

https://github.com/toolmantim/user_agent_parser

在编译Go程序并测试两者之后,Ruby版本比Go版本运行速度快2-3倍。

据我所见,两个程序都以类似的方式加载和处理用户代理。

我对GO还不熟悉,想知道是否有人看到任何主要的优化或修复,可以使使用该存储库的GO部分的程序运行得更快。

我还想知道是否有人知道其他可以很好地解析用户代理的GO库。

---测试简单程序以比较正则表达式与PCRE库(如下评论中建议的)

我创建了下面的程序,一个使用PCRE,一个使用标准的正则表达式库。然而,我似乎没有通过PCRE获得性能提升。事实上,PCRE库似乎稍微慢一些。我是不是方法错了?

--使用标准的正则表达式库

package main

import (
  "fmt"
  "regexp"
  "strings"
  "bufio"
  "os"
)

func main() {

  var regex = regexp.MustCompile(`Mac`)
  scanner := bufio.NewScanner(os.Stdin)

  for scanner.Scan() {

    line := scanner.Text()
    fields := strings.Split(line, "\t")
    fmt.Println(regex.FindIndex([]byte(fields[0])))

  }

}  

--使用PCRE库

package main

import (
  "fmt"
  pcre "github.com/glenn-brown/golang-pkg-pcre/src/pkg/pcre"
  "bufio"
  "os"
  "strings"
)

func main() {

  scanner:= bufio.NewScanner(os.Stdin)
  var regex = pcre.MustCompile(`Mac`, 0)

  for scanner.Scan() {

    line := scanner.Text()
    fields := strings.Split(line, "\t")
    fmt.Println(regex.FindIndex([]byte(fields[0]),0))

 }
}
英文:

I am trying to stream (a lot) of user agents through a GO (Golang) program to extract different information about these ua agents like device type, OS, etc.

The GO code in Tobie Langel's UA Parser Repo looks very promising:

https://github.com/tobie/ua-parser/tree/master/go/uaparser

I created a simple program, in which I basically add streaming functionality to the example on the README page. To compare performance, I created the same type of simple program with a Ruby gem that uses a similar approach and same regexes.yaml file.

https://github.com/toolmantim/user_agent_parser

After compiling the Go program and testing both, the Ruby version is running 2-3 times faster than the GO version.

As far as I can see, both programs are loading and processing the ua agents in a similar manner.

I am new to GO and am wondering if anyone sees any major optimizations or fixes that could make programs using the GO portion of this repo run faster.

I am also interested to know if anyone knows of any other GO libraries I can use to parse user agents that work well.

---TESTING SIMPLE PROGRAMS TO COMPARE REGEX VS PCRE LIBS (as suggested in the comments below)

I have created the programs below, one using PCRE and one using the standard regex library. However, I don't seem to be getting a performance boost with PCRE. In fact, the PCRE library seems to be a little slower. Am I approaching this the wrong way?

--With standard regex library

package main

import (
  "fmt"
  "regexp"
  "strings"
  "bufio"
  "os"
)

func main() {

  var regex = regexp.MustCompile(`Mac`)
  scanner := bufio.NewScanner(os.Stdin)

  for scanner.Scan() {

    line := scanner.Text()
    fields := strings.Split(line, "\t")
    fmt.Println(regex.FindIndex([]byte(fields[0])))

  }

}  

--With PCRE library

package main

import (
  "fmt"
  pcre "github.com/glenn-brown/golang-pkg-pcre/src/pkg/pcre"
  "bufio"
  "os"
  "strings"
)

func main() {

  scanner:= bufio.NewScanner(os.Stdin)
  var regex = pcre.MustCompile(`Mac`, 0)

  for scanner.Scan() {

    line := scanner.Text()
    fields := strings.Split(line, "\t")
    fmt.Println(regex.FindIndex([]byte(fields[0]),0))

 }
}  

答案1

得分: 1

我会考虑使用rubex库。我将ua-parser改为使用rubex,并且发现速度提升了7倍。该库声称可以提升10倍的速度,所以我建议你在你的特定应用中试试看。

英文:

I would consider the rubex library. I changed ua-parser to use rubex instead, and I saw a 7x speed improvement. The library claims a 10x improvement, so I would give it a try with your particular application.

huangapple
  • 本文由 发表于 2013年11月22日 06:53:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/20133606.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定