How to read and format a stream of text received through a bash pipe?

huangapple go评论90阅读模式
英文:

How to read and format a stream of text received through a bash pipe?

问题

目前,我正在使用以下代码来格式化来自我的npm脚本的数据。

npm run startWin | while IFS= read -r line; do printf '%b\n' "$line"; done | less

这个方法是有效的,但是我的同事们不使用Linux。因此,我想在Go中实现while IFS= read -r line; do printf '%b\n' "$line"; done,并将其作为管道中的二进制文件使用。

npm run startWin | magical-go-formater

我尝试过的方法

package main

import (
	"fmt"
	"io/ioutil"
	"os"
	"strings"
)

func main() {
  fi, _ := os.Stdin.Stat() // 获取FileInfo结构体

  if (fi.Mode() & os.ModeCharDevice) == 0 {

    bytes, _ := ioutil.ReadAll(os.Stdin)
    str := string(bytes)
    arr := strings.Fields(str)

    for _, v := range arr {
      fmt.Println(v)
    }
}

目前,该程序会将文本流的任何输出静音。

英文:

Currently, I'm using the following to format data from my npm script.

npm run startWin | while IFS= read -r line; do printf '%b\n' "$line"; done | less

It works, but my colleagues do not use Linux. So, I would like to implement while IFS= read -r line; do printf '%b\n' "$line"; done in Go, and use the binary in the pipe.

npm run startWin | magical-go-formater

What I tried

package main

import (
	"fmt"
	"io/ioutil"
	"os"
	"strings"
)

func main() {
  fi, _ := os.Stdin.Stat() // get the FileInfo struct

  if (fi.Mode() & os.ModeCharDevice) == 0 {

    bytes, _ := ioutil.ReadAll(os.Stdin)
    str := string(bytes)
    arr := strings.Fields(str)

    for _, v := range arr {
      fmt.Println(v)
    }
}

Currently the program silences any output from the text-stream.

答案1

得分: 2

你想要使用bufio.Scanner来进行尾部读取。在我看来,你对os.Stdin的检查是不必要的,但可能因人而异。

可以参考这个答案中的示例。ioutil.ReadAll()(现在已弃用,直接使用io.ReadAll())会一直读取直到出现错误/EOF,但它不是一个循环输入的过程,这就是为什么你需要使用bufio.Scanner.Scan()的原因。

另外,%b会将文本中的任何转义序列转换为相应的字符 - 例如,传入行中的任何\n都会被渲染为换行符 - 你需要这样吗?因为在Go中没有等效的格式说明符,据我所知。

编辑

所以我认为,基于你的ReadAll()方法可能最终会起作用。我猜你期望的行为类似于使用bufio.Scanner时的行为 - 接收进程会在写入时处理字节(实际上是一个轮询操作 - 可以查看标准库中Scan()的源代码以了解详细信息)。

但是ReadAll()会将所有读取的内容缓冲起来,直到最终获得错误或EOF才返回。我修改了一个带有仪器的ReadAll()版本(这是标准库源代码的精确副本,只是增加了一些额外的仪器输出),你可以看到它在字节被写入时进行了读取,但直到写入过程完成并关闭管道的一端(即打开的文件句柄)才返回并提供内容,这时会生成EOF:

package main

import (
	"fmt"
	"io"
	"os"
	"time"
)

func main() {

	// os.Stdin.SetReadDeadline(time.Now().Add(2 * time.Second))

	b, err := readAll(os.Stdin)
	if err != nil {
		fmt.Println("ERROR: ", err.Error())
	}

	str := string(b)
	fmt.Println(str)
}

func readAll(r io.Reader) ([]byte, error) {
	b := make([]byte, 0, 512)
	i := 0
	for {
		if len(b) == cap(b) {
			// Add more capacity (let append pick how much).
			b = append(b, 0)[:len(b)]
		}
		n, err := r.Read(b[len(b):cap(b)])

		//fmt.Fprintf(os.Stderr, "READ %d - RECEIVED: \n%s\n", i, string(b[len(b):cap(b)]))
		fmt.Fprintf(os.Stderr, "%s READ %d - RECEIVED %d BYTES\n", time.Now(), i, n)
		i++

		b = b[:len(b)+n]
		if err != nil {
			if err == io.EOF {
				fmt.Fprintln(os.Stderr, "RECEIVED EOF")
				err = nil
			}
			return b, err
		}
	}
}

我只是编写了一个简单的脚本来生成输入,模拟一个长时间运行并且只在定期间隔写入的情况,这是我想象中npm在你的情况下的行为:

#!/bin/sh

for x in 1 2 3 4 5 6 7 8 9 10
do
  cat ./main.go
  sleep 10
done

顺便说一下,我发现阅读实际的标准库代码非常有帮助...或者至少在这种情况下很有趣。

英文:

You want to use bufio.Scanner for tail-type reads. IMHO the checks you're doing on os.Stdin are unnecessary, but YMMV.

See this answer for an example. ioutil.ReadAll() (now deprecated, just use io.ReadAll()) reads up to an error/EOF, but it is not a looping input - that's why you want bufio.Scanner.Scan().

Also - %b will convert any escape sequence in the text - e.g. any \n in a passed line will be rendered as a newline - do you need that? B/c go does not have an equivalent format specifier, AFAIK.

EDIT

So I think that, your ReadAll()-based approach would/could have worked...eventually. I am guessing that you were expecting behavior like you get with bufio.Scanner - the receiving process handles bytes as they are written (it's actually a polling operation - see the standard library source for Scan() to see the grimy details).

But ReadAll() buffers everything read and doesn't return until it finally gets either an error or an EOF. I hacked up an instrumented version of ReadAll() (this is an exact copy of the standard library source with just a little bit of additional instrumentation output), and you can see that it's reading as the bytes are written, but it just doesn't return and yield the contents until the writing process is finished, at which time it closes its end of the pipe (its open filehandle), which generates the EOF:

package main

import (
	"fmt"
	"io"
	"os"
	"time"
)

func main() {

	// os.Stdin.SetReadDeadline(time.Now().Add(2 * time.Second))

	b, err := readAll(os.Stdin)
	if err != nil {
		fmt.Println("ERROR: ", err.Error())
	}

	str := string(b)
	fmt.Println(str)
}

func readAll(r io.Reader) ([]byte, error) {
	b := make([]byte, 0, 512)
	i := 0
	for {
		if len(b) == cap(b) {
			// Add more capacity (let append pick how much).
			b = append(b, 0)[:len(b)]
		}
		n, err := r.Read(b[len(b):cap(b)])

		//fmt.Fprintf(os.Stderr, "READ %d - RECEIVED: \n%s\n", i, string(b[len(b):cap(b)]))
		fmt.Fprintf(os.Stderr, "%s READ %d - RECEIVED %d BYTES\n", time.Now(), i, n)
		i++

		b = b[:len(b)+n]
		if err != nil {
			if err == io.EOF {
				fmt.Fprintln(os.Stderr, "RECEIVED EOF")
				err = nil
			}
			return b, err
		}
	}
}

I just hacked up a cheap script to generate the input, simulating something long-running and writing only at periodic intervals, how I'd imagine npm is behaving in your case:

#!/bin/sh

for x in 1 2 3 4 5 6 7 8 9 10
do
  cat ./main.go
  sleep 10
done

As a side note, I find reading the actual standard library code really helpful...or at least interesting in cases like this.

答案2

得分: 0

@Sandy Cash在提到使用Bufio时非常有帮助。我不知道为什么,如果@Jim说的是真的,但是Bufio起作用了,而ReadAll()没有起作用。

谢谢你的帮助。

代码:

package main

import (
	"bufio"
	"fmt"
	"os"
	"strings"
)

func main() {
	scanner := bufio.NewScanner(os.Stdin)
	for scanner.Scan() {
		s := scanner.Text()
		arr := strings.Split(s, `\n`)
		for _, v := range arr {
			fmt.Println(v)
		}
	}
}
英文:

@Sandy Cash was helpful in stating to use Bufio. I don't know why, if what @Jim said is true, but Bufio worked out and ReadAll() didn't.

Thanks for the help.

The code:

package main

import (
	"bufio"
	"fmt"
	"os"
	"strings"
)

func main() {
	scanner := bufio.NewScanner(os.Stdin)
	for scanner.Scan() {
		s := scanner.Text()
		arr := strings.Split(s, `\n`)
		for _, v := range arr {
			fmt.Println(v)
		}
	}
}

huangapple
  • 本文由 发表于 2023年1月21日 00:16:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/75186654.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定