Correct way to import numeric csv data in go

huangapple go评论72阅读模式
英文:

Correct way to import numeric csv data in go

问题

我想读取一个以csv格式存储的文件,其中只包含数值(带小数点),并将其存储在一个矩阵中,以便可以对其进行操作。文件的格式如下:

1.5, 2.3, 4.4
1.1, 5.3, 2.4
...

它可能有数千行和超过3列。

我使用了go csv库来解决这个问题。这会创建一个**[][]string**,然后我使用for循环将矩阵解析为**[][]float64**。

func readCSV(filepath string) [][]float64 {
    csvfile, err := os.Open(filepath)
    if err != nil {
        return nil
    }

    reader := csv.NewReader(csvfile)
    stringMatrix, err := reader.ReadAll()

    csvfile.Close()

    matrix := make([][]float64, len(stringMatrix))

    //将字符串矩阵解析为float64
    for i := range stringMatrix {
        matrix[i] = make([]float64, len(stringMatrix[0]))
        for y := range stringMatrix[i] {
            matrix[i][y], err = strconv.ParseFloat(stringMatrix[i][y], 64)
        }
    }

    return matrix
}

我想知道这种方法是否正确且高效,或者是否有更好的方法。

比如使用reader.Read()并在读取每行时解析。我不确定,但感觉我在做很多重复的工作。

英文:

I want to read a file in csv format containing only numeric values (with decimals) and store it on a matrix so I can perform operations on them. The file looks like this:

1.5, 2.3, 4.4
1.1, 5.3, 2.4
...

It may have thousands of lines and more than 3 columns.

I solved this using the go csv library. This creates a [][]string and after I use a for loop to parse the matrix into [][]float64.

func readCSV(filepath string) [][]float64 {

    csvfile, err := os.Open(filepath)
    if err != nil {
	    return nil
    }

    reader := csv.NewReader(csvfile)
    stringMatrix, err := reader.ReadAll()

    csvfile.Close()

    matrix := make([][]float64, len(stringMatrix))

    //Parse string matrix into float64
    for i := range stringMatrix {
	    matrix[i] = make([]float64, len(stringMatrix[0]))
	    for y := range stringMatrix[i] {
		    matrix[i][y], err = strconv.ParseFloat(stringMatrix[i][y], 64)
	    }
    }

    return matrix
}

I was wondering if this is a correct and efficient way of doing it or if there is a better way.

Like using reader.Read() instead and parse each line while it's being read. I don't know but it feel like I'm doing a lot duplicate work.

答案1

得分: 4

这完全取决于你如何使用数据。从内存效率来看,你的代码并不高效,因为它将整个CSV内容读入内存(stringMatrix),然后创建另一个变量来保存转换为float64的数据(matrix)。所以如果你的CSV文件大小为1GB,你的程序将使用1GB的RAM来存储stringMatrix,而matrix所需的RAM更多。

你可以通过以下方式优化代码:

  • 逐行读取reader并将数据追加到matrix中;你不需要一次性将整个stringMatrix存储在内存中;
  • 逐行读取reader并逐行处理数据。也许你不需要将matrix存储在内存中,也许你可以在读取数据时进行处理,从未将所有数据一次性存储在内存中。这取决于你的程序的其余部分,以及它如何使用CSV数据。

如果你使用上述第二种方法,你的程序可以只使用几个字节的RAM,而不是几GB,如果你不需要从该函数返回整个CSV数据的话。

英文:

It all depends on how you want to use the data. Your code isn't efficient in terms of memory because you read the entire CSV content in memory (stringMatrix) and then you create another variable to hold the data converted to float64 (matrix). So if your CSV file is 1 GB in size, your program would use 1 GB of RAM for stringMatrix + a lot more for matrix.

You can optimize the code by either:

  • Reading the reader line by line and appending the data to matrix; you don't need to have the entire stringMatrix in memory at once;
  • Reading the reader line by line and processing that data line by line. Maybe you don't need to have matrix in memory as well, maybe you can process the data as you read it and never have everything in memory at once. It depends on the rest of your program, on how it needs to use the CSV data.

Your program can use a few bytes of RAM instead of gigabytes if you use the second method above, if you don't need to return the entire CSV data from that function.

huangapple
  • 本文由 发表于 2017年9月15日 01:48:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/46225426.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定