Golang如何修改CSV文件中的字符?

huangapple go评论79阅读模式
英文:

How does golang modify the characters in the csv file

问题

我有一个包含如下代码的 CSV 文件:

"test01","127.0.1","{""type"": 3,  ""content"": ""test01""}",1,1,"2021-12-30 16:00:00.490","2021-12-30 16:00:00.490"

我想要将 CSV 文件中的 , 分隔符替换为 #,移除 JSON 值花括号周围的引号,并将 JSON 中的 "" 替换为 \",但是 Golang 似乎无法直接访问 CSV 文件的内容进行修改。
修改后的数据如下:

"test01","127.0.1",{\"type\": 3,  \"content\": \"test01\"},1,1,"2021-12-30 16:00:00.490","2021-12-30 16:00:00.490"

我理解 Golang 只能读写 CSV 文件,但是我该如何修改 CSV 文件的数据呢?

英文:

I have a csv file with content like this code:

"test01","127.0.1","{""type"": 3,  ""content"": ""test01""}",1,1,"2021-12-30 16:00:00.490","2021-12-30 16:00:00.490"

I want to replace the , delimiters of csv with #,remove quotes around json value curly braces, and change the "" in json to \", but golang seems to be unable to directly access the contents of the csv file to modify.
The changed data is as follows:

"test01","127.0.1",{\"type\": 3,  \"content\": \"test01\"},1,1,"2021-12-30 16:00:00.490","2021-12-30 16:00:00.490"

I understand that golang can only read and write csv files, but how can I modify the data of the csv file?

答案1

得分: 1

这是关于当前流行操作系统和常见编程语言中文本文件操作的基础知识。

你需要读取文件,然后写入文件(用你要写入的新数据覆盖磁盘上的旧数据)。

有两种方法可以实现这一点。第一种是目前最常用的方法,因为个人电脑的内存已经足够多:

1. 先读取到内存,然后写入磁盘。

当你将一个字节写回文件时,操作系统基本上会删除(使数据不可访问,并将该块标记为空闲)读/写光标后面的剩余数据。如果你的光标(可以使用File.Seek()来改变)恰好在文件末尾,那么你写入的内容将会追加到文件末尾。如果你的光标恰好在文件开头(例如,当你打开文件并什么都不做时),即使你只写入一个字符,你也会删除整个文件。

但只要你的程序不崩溃,或者你的电脑不突然断电,这将通过将修改后的文件版本写入内存(以字符数组的形式存储)并销毁磁盘上的旧文件版本来“修改”文件。

基本思路如下:

// 伪代码

write(theFile, modify(read(theFile)))

2. 读取文件,在读取过程中进行修改(例如,逐字节、逐行等),然后写入另一个文件。完成后删除原始文件,并将另一个文件重命名为原始文件的名称。

这种方法在拥有兆字节甚至几十千字节内存的计算机时代更为常见。当你使用Unix工具时,仍然可以看到这种方法的残留。其主要优点是你可以处理比你的内存更大的文件。其次,如果你的程序或电脑崩溃,你仍然拥有原始文件的一个良好副本,因此不会丢失数据。

基本思路如下:

// 伪代码

while (data = readLine(theFile)) {
    write(tempFile, modify(data))
}

delete(theFile)

rename(tempFile, theFile)

显然,对于处理大文件的程序(如视频编辑器或图像编辑器),这种方法仍然很有用。对于绝对需要保护其数据的程序,如配置管理工具和源代码仓库,这种方法也很有用。

在磁盘上编辑

有一些访问模式允许你在磁盘上原地编辑文件而不删除旧内容。你需要使用**"b"**标志以二进制模式打开文件,或者你可以将文件映射到内存中。这通常由数据库程序(如MySQL或Oracle)使用。

然而,这些方法不允许你从文件中删除任何字节。你只能写入文件中的特定位置,例如第37个字节或第56642个字节。因此,你不能删除任何"字符。你只能将它们更改为其他字符,如k,。因此,人们通常在编辑基于文本的数据(如CSV文件)时不使用这种方法。

英文:

This is the basics of text file operations in all currently popular operating systems and all common programming languages.

You need to read the file and then write to the file (destroying the old data on disk with the new data you are writing).

There are two ways to do this. The first is the most used method these days because PCs have more than enough RAM:

1. Read to memory then write to disk.

The moment you write a single byte back to the file the OS will basically delete (make the data not accessible and the block marked as free) the remaining data behind your read/write cursor. If your cursor (which can be changed using File.Seek()) happens to be at the end of the file then what you are writing is appended to the file. If your cursor happens to be at the beginning of the file (eg. when you open the file and do nothing) you are basically deleting the entire file even if you only write a single character back to the file.

But as long as your program doesn't crash or your PC doesn't suddenly lose power this will "modify" the file by writing the modified version of the file in your memory (stored as character arrays) and destroying the old version of the file on disk.

The idea is basically:

// pseudocode

write(theFile, modify(read(theFile)))

2. Read the file, modify while reading (eg. each byte, each line etc.) and write to another file. When done delete original file and rename the other file with the name of the original file.

This method was more common back in the days of computers with Megabytes or even tens of kilobytes of RAM. You can still see remnants of this when you use unix tools. The main advantage of this is that you can work with files that are bigger than your RAM. The secondary advantage is if your program or PC crashes you still have a good copy of the original file so you don't lose data.

The idea is basically:

// pseudocode

while (data = readLine(theFile)) {
    write(tempFile, modify(data))
}

delete(theFile)

rename(tempFile, theFile)

Obviously for programs that handle large files like video editors or image editors this method is still useful. It is also useful for programs that absolutely need to protect their data like configuration management tools and source code repositories.

On-disk editing

There are access modes that allow you to edit files in-place on the disk without deleting the old content. You need to open the file in binary mode using the "b" flag or you can memory map the file. This is often used by database programs like MySQL or Oracle.

However, these methods do not allow you to remove any bytes from the file. You can only write to specific locations in the file for example the 37th byte or the 56642nd byte. So you cannot remove any " characters. You can only change them to other characters like k or ,. For this reason people usually don't do this when editing text based data like csv files.

huangapple
  • 本文由 发表于 2022年3月8日 16:37:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/71392103.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定