通过不仅仅读取行来拆分一个大文本文件

huangapple go评论78阅读模式
英文:

Split a large text file by not simply reading in lines

问题

我有一个大文本文件,我想将其分割成任意数量的小文件。我需要的行为几乎与split终端命令相同,只是我需要文件在它们的最后一行重叠。也就是说,第一个文件的最后一行是第二个文件的第一行,第二个文件的最后一行是第三个文件的第一行,依此类推。

朴素的解决方案似乎是从原始文本文件中读取行,并在必要时进行分割。我想知道是否有一个标准库函数可以让我处理字节而不是字符串,以更容易地均匀地分割文本文件。

在Go中是否有类似于fseek的东西可以让我做到这一点?

英文:

I have a large text file that I would like a split into an arbitrary number of smaller ones. The behavior I need is nearly identical to the split terminal command except that I need the files to overlap on their last lines. That is, the last line of the first file is the first line of the second file, the last line of the second file is the first line of the third file, et cetera.

The naive solution seems to read lines in from the original text file and split when necessary. I'm wondering if there is a standard library function that will allow me to deal with bytes rather than strings to more easily split the text file uniformly.

Is there something analogous to fseek in Go that will let me do this?

答案1

得分: 5

例如,

> 包 os
>
> func (*File) Seek
>
> func (f *File) Seek(offset int64, whence int) (ret int64, err error)
>
> Seek 设置文件的下一个 ReadWrite 的偏移量为 offset,根据 whence 进行解释:0 表示相对于文件的起始位置,1 表示相对于当前偏移量,2 表示相对于文件末尾。它返回新的偏移量和错误(如果有的话)。

英文:

For example,

> Package os
>
> func (*File) Seek
>
> func (f *File) Seek(offset int64, whence int) (ret int64, err error)
>
> Seek sets the offset for the next Read or Write on file to offset,
> interpreted according to whence: 0 means relative to the origin of the
> file, 1 means relative to the current offset, and 2 means relative to
> the end. It returns the new offset and an error, if any.

huangapple
  • 本文由 发表于 2013年4月17日 07:03:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/16048678.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定