如何在Go中编辑读者

huangapple go评论78阅读模式
英文:

How to edit a reader in Go

问题

我正在努力找出在不使用ioutil.ReadAll的情况下更改流中某些数据的最佳实践方法。

我需要删除以特定字符开头的行,并删除所有其他实例。

package main

import (
	"bufio"
	"bytes"
	"fmt"
	"os"

	"gopkg.in/pg.v3"
)

func main() {
	fieldSep := "\x01"
	badChar := "\x02"
	comment := "#"
	dbName := "foo"
	db := pg.Connect(&pg.Options{})

	file, err := os.Open("/path/to/file")
	if err != nil {
		fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
	}
	defer file.Close()

	// 我需要在这里迭代我的文件读取器
	// 删除所有以comment开头的行
	scanner := bufio.NewScanner(file)
	for scanner.Scan() {
		file := bytes.TrimRight(file, comment)
	}
	// 删除所有badChar的实例
	file := bytes.Trim(file, badChar)

	_, err = db.CopyFrom(file, fmt.Sprintf("COPY %s FROM STDIN WITH DELIMITER e'%s'", dbName, fieldSep))
	if err != nil {
		fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
	}

	err = db.Close()
	if err != nil {
		fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
	}
	fmt.Println("Import Done")
}

背景信息:

我正在将大量(>10GB)的数据导入数据库,这些数据分布在多个文件中。

我的数据库接口接受一个读取器来加载数据。

数据具有非标准的行结束符,我需要删除注释(因为PG的COPY FROM不好用)。

我知道我目前的代码对流进行编辑是糟糕的,我只是找不到一个好的参考资料 - 谢谢!

英文:

I'm trying to work out what the best practise is to change some data in a stream without ioutil.ReadAll.

I need to remove lines beginning with a certain character and strip all instances of another.

package main

import (
	"bufio"
	"bytes"
	"fmt"
	"os"

	"gopkg.in/pg.v3"
)

func main() {
	fieldSep := "\x01"
	badChar := "\x02"
	comment := "#"
	dbName := "foo"
	db := pg.Connect(&pg.Options{})

	file, err := os.Open("/path/to/file")
	if err != nil {
		fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
	}
	defer file.Close()

	// I need to iterate my file Reader here
	// all lines that begin with comment and remove them
	scanner := bufio.NewScanner(file)
	for scanner.Scan() {
		file := bytes.TrimRight(file, comment)
	}
	// all instances of badChar should be dropped
	file := bytes.Trim(file, badChar)

	_, err = db.CopyFrom(file, fmt.Sprintf("COPY %s FROM STDIN WITH DELIMITER e'%s'", dbName, fieldSep))
	if err != nil {
		fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
	}

	err = db.Close()
	if err != nil {
		fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
	}
	fmt.Println("Import Done")
}

Context:

I'm to importing a large amount (>10GB) of data into a database, it's spread across several files.

My database interface accepts a reader to load the data.

The data has non-standard line endings and I need to strip comments (because PG's COPY FROM is no fun).

I know the code I've got to edit the stream is woeful, I just can't find a good reference - thanks!

答案1

得分: 1

如果我处在你的位置,我会创建自己的Reader,并将其插入源和目标之间。这就是一致的接口的作用。你的Reader可以轻松处理流经的小数据块。

源(io.Reader)   ==>  你的过滤器(io.Reader) ==>  目标(期望一个io.Reader)
提供数据         进行转换           进行操作

一个库中的例子是bufio.Reader,它被设计成可以插入在一个Reader和其客户端之间,通过缓冲更大的读取操作来提高性能,并允许客户端按需以小块方式消费数据。你可以查看它的源代码:http://golang.org/src/bufio/bufio.go

英文:

If I was in your position, I'd make my own Reader, and insert it between the source and the destination. That's what consistent interfaces are for. Your reader would work easily on the small chunks of data along as they flow past.

Source (io.Reader)   ==>  Your filter (io.Reader) ==>  Destination (expects an io.Reader)
provides the data         does the transformations       rock'n'rolls

A library example of such a reader that's made to be inserted between a reader and its client is bufio.Reader, that'll let you speed up many types of readers by buffering larger calls to the source, and letting the client consume the data in small bits if it likes it so. You can check out its source : http://golang.org/src/bufio/bufio.go

huangapple
  • 本文由 发表于 2015年8月6日 20:59:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/31856479.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定