英文:
How to use bufio.ScanWords
问题
如何使用bufio.ScanWords
和bufio.ScanLines
函数来计算单词和行数?
我尝试了以下代码:
fmt.Println(bufio.ScanWords([]byte("Good day everyone"), false))
输出结果为:
5 [103 111 111 100] <nil>
不确定这意味着什么?
英文:
How do I use bufio.ScanWords
and bufio.ScanLines
functions to count words and lines?
I tried:
fmt.Println(bufio.ScanWords([]byte("Good day everyone"), false))
Prints:
5 [103 111 111 100] <nil>
Not sure what that means?
答案1
得分: 18
计算单词数:
input := "Spicy jalapeno pastrami ut ham turducken.\n Lorem sed ullamco, leberkas sint short loin strip steak ut shoulder shankle porchetta venison prosciutto turducken swine.\n Deserunt kevin frankfurter tongue aliqua incididunt tri-tip shank nostrud.\n"
scanner := bufio.NewScanner(strings.NewReader(input))
// 设置扫描操作的分割函数。
scanner.Split(bufio.ScanWords)
// 计算单词数。
count := 0
for scanner.Scan() {
count++
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "读取输入时出错:", err)
}
fmt.Printf("%d\n", count)
计算行数:
input := "Spicy jalapeno pastrami ut ham turducken.\n Lorem sed ullamco, leberkas sint short loin strip steak ut shoulder shankle porchetta venison prosciutto turducken swine.\n Deserunt kevin frankfurter tongue aliqua incididunt tri-tip shank nostrud.\n"
scanner := bufio.NewScanner(strings.NewReader(input))
// 设置扫描操作的分割函数。
scanner.Split(bufio.ScanLines)
// 计算行数。
count := 0
for scanner.Scan() {
count++
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "读取输入时出错:", err)
}
fmt.Printf("%d\n", count)
英文:
To count words:
input := "Spicy jalapeno pastrami ut ham turducken.\n Lorem sed ullamco, leberkas sint short loin strip steak ut shoulder shankle porchetta venison prosciutto turducken swine.\n Deserunt kevin frankfurter tongue aliqua incididunt tri-tip shank nostrud.\n"
scanner := bufio.NewScanner(strings.NewReader(input))
// Set the split function for the scanning operation.
scanner.Split(bufio.ScanWords)
// Count the words.
count := 0
for scanner.Scan() {
count++
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading input:", err)
}
fmt.Printf("%d\n", count)
To count lines:
input := "Spicy jalapeno pastrami ut ham turducken.\n Lorem sed ullamco, leberkas sint short loin strip steak ut shoulder shankle porchetta venison prosciutto turducken swine.\n Deserunt kevin frankfurter tongue aliqua incididunt tri-tip shank nostrud.\n"
scanner := bufio.NewScanner(strings.NewReader(input))
// Set the split function for the scanning operation.
scanner.Split(bufio.ScanLines)
// Count the lines.
count := 0
for scanner.Scan() {
count++
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading input:", err)
}
fmt.Printf("%d\n", count)
答案2
得分: 4
这是《Go编程语言》一书中的练习7.1。
这是对@repler解决方案的扩展:
package main
import (
"bufio"
"fmt"
"os"
"strings"
)
type byteCounter int
type wordCounter int
type lineCounter int
func main() {
var c byteCounter
c.Write([]byte("Hello This is a line"))
fmt.Println("Byte Counter ", c)
var w wordCounter
w.Write([]byte("Hello This is a line"))
fmt.Println("Word Counter ", w)
var l lineCounter
l.Write([]byte("Hello \nThis \n is \na line\n.\n.\n"))
fmt.Println("Length ", l)
}
func (c *byteCounter) Write(p []byte) (int, error) {
*c += byteCounter(len(p))
return len(p), nil
}
func (w *wordCounter) Write(p []byte) (int, error) {
count := retCount(p, bufio.ScanWords)
*w += wordCounter(count)
return count, nil
}
func (l *lineCounter) Write(p []byte) (int, error) {
count := retCount(p, bufio.ScanLines)
*l += lineCounter(count)
return count, nil
}
func retCount(p []byte, fn bufio.SplitFunc) (count int) {
s := string(p)
scanner := bufio.NewScanner(strings.NewReader(s))
scanner.Split(fn)
count = 0
for scanner.Scan() {
count++
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading input:", err)
}
return
}
英文:
This is an exercise in book The Go Programming Language Exercise 7.1
This is an extension of @repler solution:
package main
import (
"bufio"
"fmt"
"os"
"strings"
)
type byteCounter int
type wordCounter int
type lineCounter int
func main() {
var c byteCounter
c.Write([]byte("Hello This is a line"))
fmt.Println("Byte Counter ", c)
var w wordCounter
w.Write([]byte("Hello This is a line"))
fmt.Println("Word Counter ", w)
var l lineCounter
l.Write([]byte("Hello \nThis \n is \na line\n.\n.\n"))
fmt.Println("Length ", l)
}
func (c *byteCounter) Write(p []byte) (int, error) {
*c += byteCounter(len(p))
return len(p), nil
}
func (w *wordCounter) Write(p []byte) (int, error) {
count := retCount(p, bufio.ScanWords)
*w += wordCounter(count)
return count, nil
}
func (l *lineCounter) Write(p []byte) (int, error) {
count := retCount(p, bufio.ScanLines)
*l += lineCounter(count)
return count, nil
}
func retCount(p []byte, fn bufio.SplitFunc) (count int) {
s := string(p)
scanner := bufio.NewScanner(strings.NewReader(s))
scanner.Split(fn)
count = 0
for scanner.Scan() {
count++
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading input:", err)
}
return
}
答案3
得分: 1
这是《The Go Programming Language》一书中的练习7.1。
这是我的解决方案:
package main
import (
"bufio"
"fmt"
)
// WordCounter 统计单词数量
type WordCounter int
// LineCounter 统计行数
type LineCounter int
type scanFunc func(p []byte, EOF bool) (advance int, token []byte, err error)
func scanBytes(p []byte, fn scanFunc) (cnt int) {
for true {
advance, token, _ := fn(p, true)
if len(token) == 0 {
break
}
p = p[advance:]
cnt++
}
return cnt
}
func (c *WordCounter) Write(p []byte) (int, error) {
cnt := scanBytes(p, bufio.ScanWords)
*c += WordCounter(cnt)
return cnt, nil
}
func (c WordCounter) String() string {
return fmt.Sprintf("包含 %d 个单词", c)
}
func (c *LineCounter) Write(p []byte) (int, error) {
cnt := scanBytes(p, bufio.ScanLines)
*c += LineCounter(cnt)
return cnt, nil
}
func (c LineCounter) String() string {
return fmt.Sprintf("包含 %d 行", c)
}
func main() {
var c WordCounter
fmt.Println(c)
fmt.Fprintf(&c, "这是一个句子。")
fmt.Println(c)
c = 0
fmt.Fprintf(&c, "This")
fmt.Println(c)
var l LineCounter
fmt.Println(l)
fmt.Fprintf(&l, `这是另一行
行`)
fmt.Println(l)
l = 0
fmt.Fprintf(&l, "这是另一行\n行")
fmt.Println(l)
fmt.Fprintf(&l, "这是一行")
fmt.Println(l)
}
英文:
This is an exercise in book The Go Programming Language Exercise 7.1
This is my solution:
package main
import (
"bufio"
"fmt"
)
// WordCounter count words
type WordCounter int
// LineCounter count Lines
type LineCounter int
type scanFunc func(p []byte, EOF bool) (advance int, token []byte, err error)
func scanBytes(p []byte, fn scanFunc) (cnt int) {
for true {
advance, token, _ := fn(p, true)
if len(token) == 0 {
break
}
p = p[advance:]
cnt++
}
return cnt
}
func (c *WordCounter) Write(p []byte) (int, error) {
cnt := scanBytes(p, bufio.ScanWords)
*c += WordCounter(cnt)
return cnt, nil
}
func (c WordCounter) String() string {
return fmt.Sprintf("contains %d words", c)
}
func (c *LineCounter) Write(p []byte) (int, error) {
cnt := scanBytes(p, bufio.ScanLines)
*c += LineCounter(cnt)
return cnt, nil
}
func (c LineCounter) String() string {
return fmt.Sprintf("contains %d lines", c)
}
func main() {
var c WordCounter
fmt.Println(c)
fmt.Fprintf(&c, "This is an sentence.")
fmt.Println(c)
c = 0
fmt.Fprintf(&c, "This")
fmt.Println(c)
var l LineCounter
fmt.Println(l)
fmt.Fprintf(&l, `This is another
line`)
fmt.Println(l)
l = 0
fmt.Fprintf(&l, "This is another\nline")
fmt.Println(l)
fmt.Fprintf(&l, "This is one line")
fmt.Println(l)
}
答案4
得分: 0
bufio.ScanWords
和 bufio.ScanLines
(以及 bufio.ScanBytes
和 bufio.ScanRunes
)是分割函数:它们为 bufio.Scanner
提供了将输入数据进行标记化的策略,即扫描过程中如何分割数据。bufio.Scanner
的默认分割函数是 bufio.ScanLines
,但可以通过方法 bufio.Scanner.Split
进行更改。
这些分割函数的类型是 SplitFunc
:
type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
通常情况下,你不需要直接调用这些函数;而是由 bufio.Scanner
来调用。但是,你可能需要创建自己的分割函数来实现自定义的标记化策略。因此,让我们来看一下它的参数:
data
:尚未处理的剩余数据。atEOF
:调用者是否已达到文件末尾(EOF),因此在下一次调用中无法提供更多新数据。advance
:调用者必须将输入数据前进的字节数。token
:作为分割结果返回给调用者的标记。
为了更好地理解,让我们看一下 bufio.ScanBytes
的实现:
func ScanBytes(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
return 1, data[0:1], nil
}
只要 data
不为空,它就会将一个字节的标记返回给调用者(data[0:1]
),并告诉调用者将输入数据前进一字节。
英文:
bufio.ScanWords
and bufio.ScanLines
(as well as bufio.ScanBytes
and bufio.ScanRunes
) are split functions: they provide a bufio.Scanner
with the strategy to tokenize its input data – how the process of scanning should split the data. The split function for a bufio.Scanner
is bufio.ScanLines
by default but can be changed through the method bufio.Scanner.Split
.
These split functions are of type SplitFunc
:
type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
Usually, you won't need to call any of these functions directly; instead, bufio.Scanner
will. However, you might need to create your own split function for implementing a custom tokenization strategy. So, let's have a look at its parameters:
data
: remaining data not processed yet.atEOF
: whether or not the caller has reached EOF and therefore has no more new data to provide in the next call.advance
: number of bytes the caller must advance the input data for the next call.token
: the token to return to the caller as a result of the splitting performed.
To gain further understanding, let's see bufio.ScanBytes
implementation:
func ScanBytes(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
return 1, data[0:1], nil
}
As long as data
isn't empty, it returns a token byte to the caller (data[0:1]
) and tells the caller to advance the input data by one byte.
答案5
得分: 0
解释bufio.ScanWords的输出:
- 第一个返回值表示当前单词的字节数(包括前导和尾随空格),称为num_bytes,它有助于通过移动到current_index + num_bytes索引来移动到下一个单词的开头。
- 第二个返回值表示单词的字节(去除任何前导和尾随空格)。
- 第三个返回值表示错误。
以下是使用这些信息计算单词数量的简单程序:
package main
import (
"bufio"
"fmt"
)
func main() {
var ar []byte = []byte("hello there, how are ya.. \n And bye")
num_words := 0
start := 0
for num, array, b := bufio.ScanWords(ar[start:], true); ; num, array, b = bufio.ScanWords(ar[start:], true) {
if b != nil {
break
}
num_words++
for _, char := range array {
fmt.Printf("%c", char)
}
fmt.Println(" ")
start += num
if start >= len(ar) {
break
}
}
fmt.Println("单词数量为", num_words)
}
以下是相应的输出:
上述代码的输出
第二个参数似乎指定是否在EOF处停止,这是将第二个参数设置为false的输出。
将第二个参数设置为false的输出
如您所见,除非我们在for循环中使用num>0作为条件,否则循环不会停止。
希望对您有所帮助。
英文:
To explain the output of bufio.ScanWords:
- The first return value represents the length of bytes in current
word(including leading and trailing spaces), say num_bytes, which helps to
move to the beginning of the next word, by moving to current_index +
num_bytes index. - The second return value represents the bytes of the word(with
any leading and trailing spaces removed). - And the third one represents the error.
Here is a simple program to count the words, using these information:
package main
import (
"bufio"
"fmt"
)
func main() {
var ar []byte = []byte("hello there, how are ya.. \n And bye")
num_words := 0
start := 0
for num, array, b := bufio.ScanWords(ar[start:], true); ; num, array, b = bufio.ScanWords(ar[start:], true) {
if b != nil {
break
}
num_words++
for _, char := range array {
fmt.Printf("%c", char)
}
fmt.Println(" ")
start += num
if start >= len(ar) {
break
}
}
fmt.Println("The number of words is ", num_words)
}
And here is the corresponding output:
Output for above's code
The second argument seems to specify whether to stop at EOF, here is an output with the second argument set to false.
Output with the second argument set to false
As you can see, the loop doesn't stop, unless we use num>0 as the condition in the for loop.
I hope this was helpful.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论