英文:
Parsing User Agents in GO (Golang) - Tobie/ua-parser
问题
我正在尝试通过一个GO(Golang)程序来流式传输大量的用户代理,以提取关于这些用户代理的不同信息,如设备类型、操作系统等。
Tobie Langel的UA Parser Repo中的GO代码看起来非常有前途:
https://github.com/tobie/ua-parser/tree/master/go/uaparser
我创建了一个简单的程序,在其中基本上为README页面上的示例添加了流式传输功能。为了比较性能,我创建了一个使用类似方法和相同regexes.yaml文件的Ruby gem的相同类型的简单程序。
https://github.com/toolmantim/user_agent_parser
在编译Go程序并测试两者之后,Ruby版本比Go版本运行速度快2-3倍。
据我所见,两个程序都以类似的方式加载和处理用户代理。
我对GO还不熟悉,想知道是否有人看到任何主要的优化或修复,可以使使用该存储库的GO部分的程序运行得更快。
我还想知道是否有人知道其他可以很好地解析用户代理的GO库。
---测试简单程序以比较正则表达式与PCRE库(如下评论中建议的)
我创建了下面的程序,一个使用PCRE,一个使用标准的正则表达式库。然而,我似乎没有通过PCRE获得性能提升。事实上,PCRE库似乎稍微慢一些。我是不是方法错了?
--使用标准的正则表达式库
package main
import (
"fmt"
"regexp"
"strings"
"bufio"
"os"
)
func main() {
var regex = regexp.MustCompile(`Mac`)
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
line := scanner.Text()
fields := strings.Split(line, "\t")
fmt.Println(regex.FindIndex([]byte(fields[0])))
}
}
--使用PCRE库
package main
import (
"fmt"
pcre "github.com/glenn-brown/golang-pkg-pcre/src/pkg/pcre"
"bufio"
"os"
"strings"
)
func main() {
scanner:= bufio.NewScanner(os.Stdin)
var regex = pcre.MustCompile(`Mac`, 0)
for scanner.Scan() {
line := scanner.Text()
fields := strings.Split(line, "\t")
fmt.Println(regex.FindIndex([]byte(fields[0]),0))
}
}
英文:
I am trying to stream (a lot) of user agents through a GO (Golang) program to extract different information about these ua agents like device type, OS, etc.
The GO code in Tobie Langel's UA Parser Repo looks very promising:
https://github.com/tobie/ua-parser/tree/master/go/uaparser
I created a simple program, in which I basically add streaming functionality to the example on the README page. To compare performance, I created the same type of simple program with a Ruby gem that uses a similar approach and same regexes.yaml file.
https://github.com/toolmantim/user_agent_parser
After compiling the Go program and testing both, the Ruby version is running 2-3 times faster than the GO version.
As far as I can see, both programs are loading and processing the ua agents in a similar manner.
I am new to GO and am wondering if anyone sees any major optimizations or fixes that could make programs using the GO portion of this repo run faster.
I am also interested to know if anyone knows of any other GO libraries I can use to parse user agents that work well.
---TESTING SIMPLE PROGRAMS TO COMPARE REGEX VS PCRE LIBS (as suggested in the comments below)
I have created the programs below, one using PCRE and one using the standard regex library. However, I don't seem to be getting a performance boost with PCRE. In fact, the PCRE library seems to be a little slower. Am I approaching this the wrong way?
--With standard regex library
package main
import (
"fmt"
"regexp"
"strings"
"bufio"
"os"
)
func main() {
var regex = regexp.MustCompile(`Mac`)
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
line := scanner.Text()
fields := strings.Split(line, "\t")
fmt.Println(regex.FindIndex([]byte(fields[0])))
}
}
--With PCRE library
package main
import (
"fmt"
pcre "github.com/glenn-brown/golang-pkg-pcre/src/pkg/pcre"
"bufio"
"os"
"strings"
)
func main() {
scanner:= bufio.NewScanner(os.Stdin)
var regex = pcre.MustCompile(`Mac`, 0)
for scanner.Scan() {
line := scanner.Text()
fields := strings.Split(line, "\t")
fmt.Println(regex.FindIndex([]byte(fields[0]),0))
}
}
答案1
得分: 1
我会考虑使用rubex库。我将ua-parser改为使用rubex,并且发现速度提升了7倍。该库声称可以提升10倍的速度,所以我建议你在你的特定应用中试试看。
英文:
I would consider the rubex library. I changed ua-parser to use rubex instead, and I saw a 7x speed improvement. The library claims a 10x improvement, so I would give it a try with your particular application.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论