How to use bufio.ScanWords

huangapple go评论73阅读模式
英文:

How to use bufio.ScanWords

问题

如何使用bufio.ScanWordsbufio.ScanLines函数来计算单词和行数?

我尝试了以下代码:

fmt.Println(bufio.ScanWords([]byte("Good day everyone"), false))

输出结果为:

5 [103 111 111 100] <nil>

不确定这意味着什么?

英文:

How do I use bufio.ScanWords and bufio.ScanLines functions to count words and lines?

I tried:

fmt.Println(bufio.ScanWords([]byte(&quot;Good day everyone&quot;), false))

Prints:

5 [103 111 111 100] &lt;nil&gt;

Not sure what that means?

答案1

得分: 18

计算单词数:

input := "Spicy jalapeno pastrami ut ham turducken.\n Lorem sed ullamco, leberkas sint short loin strip steak ut shoulder shankle porchetta venison prosciutto turducken swine.\n Deserunt kevin frankfurter tongue aliqua incididunt tri-tip shank nostrud.\n"
scanner := bufio.NewScanner(strings.NewReader(input))
// 设置扫描操作的分割函数。
scanner.Split(bufio.ScanWords)
// 计算单词数。
count := 0
for scanner.Scan() {
    count++
}
if err := scanner.Err(); err != nil {
    fmt.Fprintln(os.Stderr, "读取输入时出错:", err)
}
fmt.Printf("%d\n", count)

计算行数:

input := "Spicy jalapeno pastrami ut ham turducken.\n Lorem sed ullamco, leberkas sint short loin strip steak ut shoulder shankle porchetta venison prosciutto turducken swine.\n Deserunt kevin frankfurter tongue aliqua incididunt tri-tip shank nostrud.\n"

scanner := bufio.NewScanner(strings.NewReader(input))
// 设置扫描操作的分割函数。
scanner.Split(bufio.ScanLines)
// 计算行数。
count := 0
for scanner.Scan() {
    count++
}
if err := scanner.Err(); err != nil {
    fmt.Fprintln(os.Stderr, "读取输入时出错:", err)
}
fmt.Printf("%d\n", count)
英文:

To count words:

input := &quot;Spicy jalapeno pastrami ut ham turducken.\n Lorem sed ullamco, leberkas sint short loin strip steak ut shoulder shankle porchetta venison prosciutto turducken swine.\n Deserunt kevin frankfurter tongue aliqua incididunt tri-tip shank nostrud.\n&quot;
scanner := bufio.NewScanner(strings.NewReader(input))
// Set the split function for the scanning operation.
scanner.Split(bufio.ScanWords)
// Count the words.
count := 0
for scanner.Scan() {
	count++
}
if err := scanner.Err(); err != nil {
	fmt.Fprintln(os.Stderr, &quot;reading input:&quot;, err)
}
fmt.Printf(&quot;%d\n&quot;, count)

To count lines:

input := &quot;Spicy jalapeno pastrami ut ham turducken.\n Lorem sed ullamco, leberkas sint short loin strip steak ut shoulder shankle porchetta venison prosciutto turducken swine.\n Deserunt kevin frankfurter tongue aliqua incididunt tri-tip shank nostrud.\n&quot;

scanner := bufio.NewScanner(strings.NewReader(input))
// Set the split function for the scanning operation.
scanner.Split(bufio.ScanLines)
// Count the lines.
count := 0
for scanner.Scan() {
	count++
}
if err := scanner.Err(); err != nil {
	fmt.Fprintln(os.Stderr, &quot;reading input:&quot;, err)
}
fmt.Printf(&quot;%d\n&quot;, count)

答案2

得分: 4

这是《Go编程语言》一书中的练习7.1。

这是对@repler解决方案的扩展:

package main

import (
	"bufio"
	"fmt"
	"os"
	"strings"
)

type byteCounter int
type wordCounter int
type lineCounter int

func main() {
	var c byteCounter
	c.Write([]byte("Hello This is a line"))
	fmt.Println("Byte Counter ", c)

	var w wordCounter
	w.Write([]byte("Hello This is a line"))
	fmt.Println("Word Counter ", w)

	var l lineCounter
	l.Write([]byte("Hello \nThis \n is \na line\n.\n.\n"))
	fmt.Println("Length ", l)

}

func (c *byteCounter) Write(p []byte) (int, error) {
	*c += byteCounter(len(p))
	return len(p), nil
}

func (w *wordCounter) Write(p []byte) (int, error) {
	count := retCount(p, bufio.ScanWords)
	*w += wordCounter(count)
	return count, nil
}

func (l *lineCounter) Write(p []byte) (int, error) {
	count := retCount(p, bufio.ScanLines)
	*l += lineCounter(count)
	return count, nil
}

func retCount(p []byte, fn bufio.SplitFunc) (count int) {
	s := string(p)
	scanner := bufio.NewScanner(strings.NewReader(s))
	scanner.Split(fn)
	count = 0
	for scanner.Scan() {
		count++
	}
	if err := scanner.Err(); err != nil {
		fmt.Fprintln(os.Stderr, "reading input:", err)
	}
	return
}
英文:

This is an exercise in book The Go Programming Language Exercise 7.1

This is an extension of @repler solution:

package main
import (
&quot;bufio&quot;
&quot;fmt&quot;
&quot;os&quot;
&quot;strings&quot;
)
type byteCounter int
type wordCounter int
type lineCounter int
func main() {
var c byteCounter
c.Write([]byte(&quot;Hello This is a line&quot;))
fmt.Println(&quot;Byte Counter &quot;, c)
var w wordCounter
w.Write([]byte(&quot;Hello This is a line&quot;))
fmt.Println(&quot;Word Counter &quot;, w)
var l lineCounter
l.Write([]byte(&quot;Hello \nThis \n is \na line\n.\n.\n&quot;))
fmt.Println(&quot;Length &quot;, l)
}
func (c *byteCounter) Write(p []byte) (int, error) {
*c += byteCounter(len(p))
return len(p), nil
}
func (w *wordCounter) Write(p []byte) (int, error) {
count := retCount(p, bufio.ScanWords)
*w += wordCounter(count)
return count, nil
}
func (l *lineCounter) Write(p []byte) (int, error) {
count := retCount(p, bufio.ScanLines)
*l += lineCounter(count)
return count, nil
}
func retCount(p []byte, fn bufio.SplitFunc) (count int) {
s := string(p)
scanner := bufio.NewScanner(strings.NewReader(s))
scanner.Split(fn)
count = 0
for scanner.Scan() {
count++
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, &quot;reading input:&quot;, err)
}
return
}

答案3

得分: 1

这是《The Go Programming Language》一书中的练习7.1。

这是我的解决方案:

package main

import (
	"bufio"
	"fmt"
)

// WordCounter 统计单词数量
type WordCounter int

// LineCounter 统计行数
type LineCounter int

type scanFunc func(p []byte, EOF bool) (advance int, token []byte, err error)

func scanBytes(p []byte, fn scanFunc) (cnt int) {
	for true {
		advance, token, _ := fn(p, true)
		if len(token) == 0 {
			break
		}
		p = p[advance:]
		cnt++
	}
	return cnt
}

func (c *WordCounter) Write(p []byte) (int, error) {
	cnt := scanBytes(p, bufio.ScanWords)
	*c += WordCounter(cnt)
	return cnt, nil
}

func (c WordCounter) String() string {
	return fmt.Sprintf("包含 %d 个单词", c)
}

func (c *LineCounter) Write(p []byte) (int, error) {
	cnt := scanBytes(p, bufio.ScanLines)
	*c += LineCounter(cnt)
	return cnt, nil
}

func (c LineCounter) String() string {
	return fmt.Sprintf("包含 %d 行", c)
}

func main() {
	var c WordCounter
	fmt.Println(c)

	fmt.Fprintf(&c, "这是一个句子。")
	fmt.Println(c)

	c = 0
	fmt.Fprintf(&c, "This")
	fmt.Println(c)

	var l LineCounter
	fmt.Println(l)

	fmt.Fprintf(&l, `这是另一行
行`)
	fmt.Println(l)

	l = 0
	fmt.Fprintf(&l, "这是另一行\n行")
	fmt.Println(l)

	fmt.Fprintf(&l, "这是一行")
	fmt.Println(l)
}
英文:

This is an exercise in book The Go Programming Language Exercise 7.1

This is my solution:

package main

import (
	&quot;bufio&quot;
	&quot;fmt&quot;
)

// WordCounter count words
type WordCounter int

// LineCounter count Lines
type LineCounter int

type scanFunc func(p []byte, EOF bool) (advance int, token []byte, err error)

func scanBytes(p []byte, fn scanFunc) (cnt int) {
	for true {
		advance, token, _ := fn(p, true)
		if len(token) == 0 {
			break
		}
		p = p[advance:]
		cnt++
	}
	return cnt
}

func (c *WordCounter) Write(p []byte) (int, error) {
	cnt := scanBytes(p, bufio.ScanWords)
	*c += WordCounter(cnt)
	return cnt, nil
}

func (c WordCounter) String() string {
	return fmt.Sprintf(&quot;contains %d words&quot;, c)
}

func (c *LineCounter) Write(p []byte) (int, error) {
	cnt := scanBytes(p, bufio.ScanLines)
	*c += LineCounter(cnt)
	return cnt, nil
}

func (c LineCounter) String() string {
	return fmt.Sprintf(&quot;contains %d lines&quot;, c)
}

func main() {
	var c WordCounter
	fmt.Println(c)

	fmt.Fprintf(&amp;c, &quot;This is an sentence.&quot;)
	fmt.Println(c)

	c = 0
	fmt.Fprintf(&amp;c, &quot;This&quot;)
	fmt.Println(c)

	var l LineCounter
	fmt.Println(l)

	fmt.Fprintf(&amp;l, `This is another
line`)
	fmt.Println(l)

	l = 0
	fmt.Fprintf(&amp;l, &quot;This is another\nline&quot;)
	fmt.Println(l)

	fmt.Fprintf(&amp;l, &quot;This is one line&quot;)
	fmt.Println(l)
}

答案4

得分: 0

bufio.ScanWordsbufio.ScanLines(以及 bufio.ScanBytesbufio.ScanRunes)是分割函数:它们为 bufio.Scanner 提供了将输入数据进行标记化的策略,即扫描过程中如何分割数据。bufio.Scanner 的默认分割函数是 bufio.ScanLines,但可以通过方法 bufio.Scanner.Split 进行更改。

这些分割函数的类型是 SplitFunc

type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)

通常情况下,你不需要直接调用这些函数;而是由 bufio.Scanner 来调用。但是,你可能需要创建自己的分割函数来实现自定义的标记化策略。因此,让我们来看一下它的参数:

  • data:尚未处理的剩余数据。
  • atEOF:调用者是否已达到文件末尾(EOF),因此在下一次调用中无法提供更多新数据。
  • advance:调用者必须将输入数据前进的字节数
  • token:作为分割结果返回给调用者的标记。

为了更好地理解,让我们看一下 bufio.ScanBytes 的实现:

func ScanBytes(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
return 1, data[0:1], nil
}

只要 data 不为空,它就会将一个字节的标记返回给调用者(data[0:1]),并告诉调用者将输入数据前进一字节。

英文:

bufio.ScanWords and bufio.ScanLines (as well as bufio.ScanBytes and bufio.ScanRunes) are split functions: they provide a bufio.Scanner with the strategy to tokenize its input data – how the process of scanning should split the data. The split function for a bufio.Scanner is bufio.ScanLines by default but can be changed through the method bufio.Scanner.Split.

These split functions are of type SplitFunc:

type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)

Usually, you won't need to call any of these functions directly; instead, bufio.Scanner will. However, you might need to create your own split function for implementing a custom tokenization strategy. So, let's have a look at its parameters:

  • data: remaining data not processed yet.
  • atEOF: whether or not the caller has reached EOF and therefore has no more new data to provide in the next call.
  • advance: number of bytes the caller must advance the input data for the next call.
  • token: the token to return to the caller as a result of the splitting performed.

To gain further understanding, let's see bufio.ScanBytes implementation:

func ScanBytes(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF &amp;&amp; len(data) == 0 {
return 0, nil, nil
}
return 1, data[0:1], nil
}

As long as data isn't empty, it returns a token byte to the caller (data[0:1]) and tells the caller to advance the input data by one byte.

答案5

得分: 0

解释bufio.ScanWords的输出:

  • 第一个返回值表示当前单词的字节数(包括前导和尾随空格),称为num_bytes,它有助于通过移动到current_index + num_bytes索引来移动到下一个单词的开头。
  • 第二个返回值表示单词的字节(去除任何前导和尾随空格)。
  • 第三个返回值表示错误。

以下是使用这些信息计算单词数量的简单程序:

package main

import (
	"bufio"
	"fmt"
)

func main() {
	var ar []byte = []byte("hello there,       how are ya.. \n And bye")

	num_words := 0

	start := 0
	for num, array, b := bufio.ScanWords(ar[start:], true); ; num, array, b = bufio.ScanWords(ar[start:], true) {
		if b != nil {
			break
		}
		num_words++

		for _, char := range array {
			fmt.Printf("%c", char)
		}
		fmt.Println(" ")

		start += num
		if start >= len(ar) {
			break
		}
	}

	fmt.Println("单词数量为", num_words)

}

以下是相应的输出:
上述代码的输出
第二个参数似乎指定是否在EOF处停止,这是将第二个参数设置为false的输出。
将第二个参数设置为false的输出
如您所见,除非我们在for循环中使用num>0作为条件,否则循环不会停止。

希望对您有所帮助。

英文:

To explain the output of bufio.ScanWords:

  • The first return value represents the length of bytes in current
    word(including leading and trailing spaces), say num_bytes, which helps to
    move to the beginning of the next word, by moving to current_index +
    num_bytes
    index.
  • The second return value represents the bytes of the word(with
    any leading and trailing spaces removed).
  • And the third one represents the error.

Here is a simple program to count the words, using these information:

package main

import (
	&quot;bufio&quot;
	&quot;fmt&quot;
)

func main() {
	var ar []byte = []byte(&quot;hello there,       how are ya.. \n And bye&quot;)

	num_words := 0

	start := 0
	for num, array, b := bufio.ScanWords(ar[start:], true); ; num, array, b = bufio.ScanWords(ar[start:], true) {
		if b != nil {
			break
		}
		num_words++

		for _, char := range array {
			fmt.Printf(&quot;%c&quot;, char)
		}
		fmt.Println(&quot; &quot;)

		start += num
		if start &gt;= len(ar) {
			break
		}
	}

	fmt.Println(&quot;The number of words is &quot;, num_words)

}

And here is the corresponding output:
Output for above's code
The second argument seems to specify whether to stop at EOF, here is an output with the second argument set to false.
Output with the second argument set to false
As you can see, the loop doesn't stop, unless we use num>0 as the condition in the for loop.

I hope this was helpful.

huangapple
  • 本文由 发表于 2017年4月17日 18:57:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/43450113.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定