英文:
Using Golang to read csv, reorder columns then write result to a new csv with Concurrency
问题
这是我的起点。
这是一个用于读取包含3列的CSV文件、重新排序列并将结果写入新的CSV文件的Golang脚本。
package main
import (
"fmt"
"encoding/csv"
"io"
"os"
"math/rand"
"time"
)
func main(){
start_time := time.Now()
// 加载CSV文件
rFile, err := os.Open("data/small.csv") //3列
if err != nil {
fmt.Println("错误:", err)
return
}
defer rFile.Close()
// 创建CSV读取器
reader := csv.NewReader(rFile)
lines, err := reader.ReadAll()
if err == io.EOF {
fmt.Println("错误:", err)
return
}
// 创建CSV写入器
wFile, err := os.Create("data/result.csv")
if err != nil {
fmt.Println("错误:",err)
return
}
defer wFile.Close()
writer := csv.NewWriter(wFile)
// 读取数据,随机化列并将新行写入results.csv
rand.Seed(int64(time.Now().Nanosecond()))
var col_index []int
for i,line :=range lines{
if i == 0 {
// 根据第一行记录的列数随机化列索引
col_index = rand.Perm(len(line))
}
writer.Write([]string{line[col_index[0]], line[col_index[1]], line[col_index[2]]}) //3列
writer.Flush()
}
// 打印报告
fmt.Println("行数:",len(lines))
fmt.Println("耗时:", time.Since(start_time))
}
问题:
-
我的代码符合Golang的惯用方式吗?
-
如何在这段代码中添加并发性?
英文:
Here's my starting point.
It is a Golang script to read in a csv with 3 columns, re-order the columns and write the result to a new csv file.
package main
import (
"fmt"
"encoding/csv"
"io"
"os"
"math/rand"
"time"
)
func main(){
start_time := time.Now()
// Loading csv file
rFile, err := os.Open("data/small.csv") //3 columns
if err != nil {
fmt.Println("Error:", err)
return
}
defer rFile.Close()
// Creating csv reader
reader := csv.NewReader(rFile)
lines, err := reader.ReadAll()
if err == io.EOF {
fmt.Println("Error:", err)
return
}
// Creating csv writer
wFile, err := os.Create("data/result.csv")
if err != nil {
fmt.Println("Error:",err)
return
}
defer wFile.Close()
writer := csv.NewWriter(wFile)
// Read data, randomize columns and write new lines to results.csv
rand.Seed(int64(time.Now().Nanosecond()))
var col_index []int
for i,line :=range lines{
if i == 0 {
//randomize column index based on the number of columns recorded in the 1st line
col_index = rand.Perm(len(line))
}
writer.Write([]string{line[col_index[0]], line[col_index[1]], line[col_index[2]]}) //3 columns
writer.Flush()
}
//print report
fmt.Println("No. of lines: ",len(lines))
fmt.Println("Time taken: ", time.Since(start_time))
}
Question:
-
Is my code idiomatic for Golang?
-
How can I add concurrency to this code?
答案1
得分: 1
你的代码没问题。并发情况下没有太多的情况需要考虑。但是你可以通过实时重新排序来减少内存消耗。只需使用Read()
而不是ReadAll()
,以避免为整个输入文件分配切片。
for line, err := reader.Read(); err == nil; line, err = reader.Read(){
if err = writer.Write([]string{line[col_index[0]], line[col_index[1]], line[col_index[2]]}); err != nil {
fmt.Println("Error:", err)
break
}
writer.Flush()
}
英文:
Your code is OK. There are no much case for concurrency. But you can at least reduce memory consumption reordering on the fly. Just use Read()
instead of ReadAll()
to avoid allocating slice for hole input file.
for line, err := reader.Read(); err == nil; line, err = reader.Read(){
if err = writer.Write([]string{line[col_index[0]], line[col_index[1]], line[col_index[2]]}); err != nil {
fmt.Println("Error:", err)
break
}
writer.Flush()
}
答案2
得分: 0
将col_index
的初始化移到写入循环之外:
if len(lines) > 0 {
// 根据第一行记录的列数随机化列索引
col_index := rand.Perm(len(lines[0]))
newLine := make([]string, len(col_index))
for _, line := range lines[1:] {
for from, to := range col_index {
newLine[to] = line[from]
}
writer.Write(newLine)
writer.Flush()
}
}
要使用并发,不能使用reader.ReadAll
。而是创建一个goroutine调用reader.Read
,并将输出写入一个通道,该通道将替代lines
数组。主goroutine将读取通道并进行洗牌和写入操作。
英文:
Move the col_index
initialisation outside the write loop:
if len(lines) > 0 {
//randomize column index based on the number of columns recorded in the 1st line
col_index := rand.Perm(len(lines[0]))
newLine := make([]string, len(col_index))
for _, line :=range lines[1:] {
for from, to := range col_index {
newLine[to] = line[from]
}
writer.Write(newLine)
writer.Flush()
}
}
To use concurrency, you must not use reader.ReadAll
. Instead make a goroutine that calls reader.Read
and write the output on a channel that would replace the lines
array. The main goroutine would read the channel and do the shuffle and the write.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论