英文:
How a scanner can be implemented with a custom split
问题
我有一个日志文件,需要使用golang解析其中的每条记录。
每条记录以"#"开头,一条记录可以跨越一行或多行:
# 行1
# 行2
继续行2
继续行2
# 行3
.....
一些代码,我是初学者
f, _ := os.Open(mylog)
scanner := bufio.NewScanner(f)
var queryRec string
for scanner.Scan() {
line := scanner.Text()
if strings.HasPrefix(line, "# ") && len(queryRec) == 0 {
queryRec = line
} else if !strings.HasPrefix(line, "# ") && len(queryRec) == 0 {
fmt.Println("有一个大问题!!!")
} else if !strings.HasPrefix(line, "# ") && len(queryRec) != 0 {
queryRec += line
} else if strings.HasPrefix(line, "# ") && len(queryRec) != 0 {
queryRec = line
}
}
谢谢,
英文:
I have a log file, and I need to parse each record in it using golang.
Each record begin with "#", and a record can span one or more lines :
# Line1
# Line2
Continued line2
Continued line2
# line3
.....
Some code :), I'm a beginner
f, _ := os.Open(mylog)
scanner := bufio.NewScanner(f)
var queryRec string
for scanner.Scan() {
line := scanner.Text()
if strings.HasPrefix(line, "# ") && len(queryRec) == 0 {
queryRec = line
} else if !strings.HasPrefix(line, "# ") && len(queryRec) == 0 {
fmt.Println("There is a big problem!!!")
} else if !strings.HasPrefix(line, "# ") && len(queryRec) != 0 {
queryRec += line
} else if strings.HasPrefix(line, "# ") && len(queryRec) != 0 {
queryRec = line
}
}
Thanks,
答案1
得分: 17
Scanner
类型有一个名为 Split 的函数,它允许你传递一个 SplitFunc
来确定扫描器如何分割给定的字节切片。默认的 SplitFunc
是 ScanLines
,你可以在 实现源码 中看到它的具体实现。从这一点出发,你可以编写自己的 SplitFunc
来根据特定的格式来分割 bufio.Reader
的内容。
func crunchSplitFunc(data []byte, atEOF bool) (advance int, token []byte, err error) {
// 如果在文件末尾且没有传递数据,则返回空
if atEOF && len(data) == 0 {
return 0, nil, nil
}
// 查找输入中以换行符后跟井号的索引
if i := strings.Index(string(data), "\n#"); i >= 0 {
return i + 1, data[0:i], nil
}
// 如果在文件末尾且有数据,则返回数据
if atEOF {
return len(data), data, nil
}
return
}
你可以在 https://play.golang.org/p/ecCYkTzme4 上查看该示例的完整实现。文档提供了实现类似功能所需的所有信息。
英文:
The Scanner
type has a function called Split which allows you to pass a SplitFunc
to determine how the scanner will split the given byte slice. The default SplitFunc
is the ScanLines
which you can see the implementation source. From this point you can write your own SplitFunc
to break the bufio.Reader
content based on your specific format.
func crunchSplitFunc(data []byte, atEOF bool) (advance int, token []byte, err error) {
// Return nothing if at end of file and no data passed
if atEOF && len(data) == 0 {
return 0, nil, nil
}
// Find the index of the input of a newline followed by a
// pound sign.
if i := strings.Index(string(data), "\n#"); i >= 0 {
return i + 1, data[0:i], nil
}
// If at end of file with data return the data
if atEOF {
return len(data), data, nil
}
return
}
You can see the full implementation of the example at https://play.golang.org/p/ecCYkTzme4. The documentation provides all the insight needed to implement something like this.
答案2
得分: 10
稍微优化了Ben Campbell和sto-b-doo的解决方案。
将字节切片转换为字符串似乎是一个相当耗费资源的操作。
在我的日志处理应用程序中,这成为了一个瓶颈。
只保持数据为字节可以使我的应用程序性能提升约1500%。
func SplitAt(substring string) func(data []byte, atEOF bool) (advance int, token []byte, err error) {
searchBytes := []byte(substring)
searchLen := len(searchBytes)
return func(data []byte, atEOF bool) (advance int, token []byte, err error) {
dataLen := len(data)
// 如果在文件末尾且没有传递数据,则返回空
if atEOF && dataLen == 0 {
return 0, nil, nil
}
// 查找下一个分隔符并返回标记
if i := bytes.Index(data, searchBytes); i >= 0 {
return i + searchLen, data[0:i], nil
}
// 如果在文件末尾,且有最后一行没有终止符号,则返回该行
if atEOF {
return dataLen, data, nil
}
// 请求更多数据
return 0, nil, nil
}
}
英文:
Slightly optimized solution of Ben Campbell and sto-b-doo
Conversion of byte slice to string appears to be quite heavy operation.
In my app for log processing it became a bottleneck.
Just keeping data in bytes gives ~1500% performance boost to my app.
func SplitAt(substring string) func(data []byte, atEOF bool) (advance int, token []byte, err error) {
searchBytes := []byte(substring)
searchLen := len(searchBytes)
return func(data []byte, atEOF bool) (advance int, token []byte, err error) {
dataLen := len(data)
// Return nothing if at end of file and no data passed
if atEOF && dataLen == 0 {
return 0, nil, nil
}
// Find next separator and return token
if i := bytes.Index(data, searchBytes); i >= 0 {
return i + searchLen, data[0:i], nil
}
// If we're at EOF, we have a final, non-terminated line. Return it.
if atEOF {
return dataLen, data, nil
}
// Request more data.
return 0, nil, nil
}
}
答案3
得分: 2
以下是将Ben Campbell的答案包装成一个返回子字符串的splitfunc的函数:
欢迎提出改进建议
// SplitAt返回一个bufio.SplitFunc闭包,用于在子字符串处进行分割
// scanner.Split(SplitAt("\n# "))
func SplitAt(substring string) func(data []byte, atEOF bool) (advance int, token []byte, err error) {
return func(data []byte, atEOF bool) (advance int, token []byte, err error) {
// 如果在文件末尾且没有传递数据,则返回空
if atEOF && len(data) == 0 {
return 0, nil, nil
}
// 查找分隔子字符串的输入索引
if i := strings.Index(string(data), substring); i >= 0 {
return i + len(substring), data[0:i], nil
}
// 如果在文件末尾且有数据,则返回数据
if atEOF {
return len(data), data, nil
}
return
}
}
英文:
Ben Campbell's answer wrapped into a func that returns a splitfunc for a substring:
Improvement suggestions welcome
// SplitAt returns a bufio.SplitFunc closure, splitting at a substring
// scanner.Split(SplitAt("\n# "))
func SplitAt(substring string) func(data []byte, atEOF bool) (advance int, token []byte, err error) {
return func(data []byte, atEOF bool) (advance int, token []byte, err error) {
// Return nothing if at end of file and no data passed
if atEOF && len(data) == 0 {
return 0, nil, nil
}
// Find the index of the input of the separator substring
if i := strings.Index(string(data), substring); i >= 0 {
return i + len(substring), data[0:i], nil
}
// If at end of file with data return the data
if atEOF {
return len(data), data, nil
}
return
}
}
答案4
得分: 0
希望这是对stu0292改进的一个改进(也许是可读性)并使用了最终令牌信号。
// SplitAt返回一个bufio.SplitFunc闭包,以子字符串分割
// scanner.Split(SplitAt("\n#"))
func SplitAt(substring string) func(data []byte, atEOF bool) (advance int, token []byte, err error) {
return func(data []byte, atEOF bool) (advance int, token []byte, err error) {
// 找到分隔子字符串的输入索引
if i := strings.Index(string(data), substring); i >= 0 {
return i + len(substring), data[0:i], nil
}
if !atEOF {
return 0, nil, nil
}
return len(data), data, bufio.ErrFinalToken
}
}
英文:
Hopefully an improvement (maybe readability) over stu0292's improvements
And uses the final token signal.
// SplitAt returns a bufio.SplitFunc closure, splitting at a substring
// scanner.Split(SplitAt("\n#"))
func SplitAt(substring string) func(data []byte, atEOF bool) (advance int, token []byte, err error) {
return func(data []byte, atEOF bool) (advance int, token []byte, err error) {
// Find the index of the input of the separator substring
if i := strings.Index(string(data), substring); i >= 0 {
return i + len(substring), data[0:i], nil
}
if !atEOF {
return 0, nil, nil
}
return len(data), data, bufio.ErrFinalToken
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论