英文:
How to split a large csv file into multiple files in GO lang?
问题
我是一名初学者的Go语言程序员,正在尝试学习Go语言的特性。我想将一个大的CSV文件拆分成多个文件,每个文件都包含标题。我该如何做到这一点?我已经到处搜索了,但没有找到正确的解决方案。非常感谢您在这方面提供的任何帮助。
另外,请推荐一本好的参考书给我。
谢谢您!
英文:
I am a novice Go lang programmer,trying to learn Go lang features.I wanted to split a large csv file into multiple files in GO lang, each file containing the header.How do i do this? I have searched everywhere but couldnt get the right solution.Any help in this regard will be greatly appreciated.
Also please suggest me a good book for reference.
Thanking You
答案1
得分: 3
根据你的shell fu,这个问题可能更适合使用常见的shell工具,但你特别提到了go。
让我们思考一下这个问题。
这个CSV文件有多大?是100行还是5G?
如果文件比较小,我通常使用这个方法:
http://golang.org/pkg/io/ioutil/#ReadFile
然而,还有这个包可供使用:
http://golang.org/pkg/encoding/csv/
无论如何,让我们回到问题的抽象层面。你有一个标题(即第一行),然后是文档的其余部分。
所以,我们可能想做的事情(暂时忽略CSV)是读入我们的文件。
然后,我们想要通过文件中的所有换行符将文件主体拆分。
你可以使用这个方法来实现:
http://golang.org/pkg/strings/#Split
你没有提到,但你知道你想要按照多少个文件拆分,还是你更愿意按照行数或字节数拆分?实际的限制是什么?
通常不会是文件数量,但如果我们假设是的话,我们只需将行数除以预期的文件数量,得到每个文件的行数。
现在,我们可以取出适当大小的切片,并通过以下方式将文件写回:
http://golang.org/pkg/io/ioutil/#WriteFile
我有时使用的一个技巧是写下我们的任务陈述。
“我想在go中将一个大的CSV文件拆分成多个文件”
然后我开始将其分解成若干部分,采用分而治之的方法——不要试图一次解决整个问题——只需将其分解为你可以思考的部分。
此外,尽可能多地使用伪代码,直到你能够舒适地编写真正的代码为止。有时,只需在代码中写一个简短的注释,说明代码应该如何流动,然后将其缩小到最小的可编码部分,并从那里开始工作。
顺便说一句,许多golang.org的包都有示例链接,你可以在浏览器中运行示例代码,并将其复制粘贴到你自己的本地环境中。
此外,我知道有些人会对此持反对意见,但就我个人而言,我认为通过尝试让事情正常工作,你会学得更快,而不是通过阅读。行动总是胜过被动。不要害怕失败。
英文:
Depending on your shell fu this problem might be better suited for common shell utilities but you specifically mentioned go.
Let's think through the problem.
How big is this csv file? Are we talking 100 lines or is it 5G ?
If it's smallish I typically use this:
http://golang.org/pkg/io/ioutil/#ReadFile
However, this package also exists:
http://golang.org/pkg/encoding/csv/
Regardless - let's return to the abstraction of the problem. You have a header (which is the first line) and then the rest of the document.
So what we probably want to do (if ignoring csv for the moment) is to read in our file.
Then we want to split the file body by all the newlines in it.
You can use this to do so:
http://golang.org/pkg/strings/#Split
You didn't mention but do you know how many files you want to split by or would you rather split by the line count or byte count? What's the actual limitation here?
Generally it's not going to be file count but if we pretend it is we simply want to divide our line count by our expected file count to give lines/file.
Now we can take slices of the appropriate size and write the file back out via:
http://golang.org/pkg/io/ioutil/#WriteFile
A trick I use sometime to help think me threw these things is to write down our mission statement.
"I want to split a large csv file into multiple files in go"
Then I start breaking that up into pieces but take the divide/conquer approach - don't try to solve the entire problem in one go - just break it up to where you can think about it.
Also - make gratiutious use of pseudo-code until you can comfortably write the real code itself. Sometimes it helps to just write a short comment inline with how you think the code should flow and then get it down to the smallest portion that you can code and work from there.
By the way - many of the golang.org packages have example links where you can literally run in your browser the example code and cut/paste that to your own local environment.
Also, I know I'll catch some haters with this - but as for books - imo - you are going to learn a lot faster just by trying to get things working rather than reading. Action trumps passivity always. Don't be afraid to fail.
答案2
得分: 0
这是一个可能会有帮助的软件包。你可以设置所需的分块大小(以字节为单位),然后文件将被分割成适当数量的块。
英文:
Here is a package that might help. You can set a necessary chunk size in bytes and a file will be split on an appropriate amount of chunks.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论