英文:
How to filter elements of a [][]string slice in Golang?
问题
首先,你的代码看起来很不错!要过滤重复项,你可以使用一个 map 来存储已经出现过的记录。以下是一个示例函数,可以帮助你过滤重复项:
func filterDuplicates(records [][]string) [][]string {
seen := make(map[string]bool)
result := [][]string{}
for _, record := range records {
key := strings.Join(record, ",")
if !seen[key] {
seen[key] = true
result = append(result, record)
}
}
return result
}
你可以在 WriteAll
操作之前调用这个函数,将过滤后的数据传递给 WriteAll
。修改你的代码如下:
func main() {
// 打开 csv 文件
recordFile, err := os.Open("vehicles.csv")
if err != nil {
fmt.Println("遇到错误:", err)
return
}
defer recordFile.Close()
// 读取数据
reader := csv.NewReader(recordFile)
vehicles, err := reader.ReadAll()
if err != nil {
fmt.Println("遇到错误:", err)
return
}
// 过滤重复项
filteredVehicles := filterDuplicates(vehicles)
// 创建新的 csv 文件
newRecordFile, err := os.Create("newCsvFile.csv")
if err != nil {
fmt.Println("遇到错误:", err)
return
}
defer newRecordFile.Close()
// 写入数据到新的 csv 文件
writer := csv.NewWriter(newRecordFile)
err = writer.WriteAll(filteredVehicles)
if err != nil {
fmt.Println("遇到错误:", err)
return
}
writer.Flush()
}
这样,newCsvFile.csv
中的数据将是过滤后的数据,不包含重复项。希望对你有帮助!祝你编码愉快!
英文:
First of all i'm new here and i'm trying to learn Golang. I would like to check my csv file (which has 3 values; type, maker, model) and create a new one and after a filter operation i want to write new data(filtered) to the created csv file. Here is my code so you can understand me more clearly.
package main
import (
"encoding/csv"
"fmt"
"os"
)
func main() {
//openning my csv file which is vehicles.csv
recordFile, err := os.Open("vehicles.csv")
if err != nil{
fmt.Println("An error encountered ::", err)
}
//reading it
reader := csv.NewReader(recordFile)
vehicles, _ := reader.ReadAll()
//creating a new csv file
newRecordFile, err := os.Create("newCsvFile.csv")
if err != nil{
fmt.Println("An error encountered ::", err)
}
//writing vehicles.csv into the new csv
writer := csv.NewWriter(newRecordFile)
err = writer.WriteAll(vehicles)
if err != nil {
fmt.Println("An error encountered ::", err)
}
}
After i build it, it is working this way. It reads and writes the all data to new created csv file. But the problem here is, i want to filter duplicates of readed csv which is vehicles, i am creating another function (outside of the main function) to filter duplicates but i can't do it because vehicles 's type is [][]string, i searched the internet about filtering duplicates but all i found is int or string types. What i want to do is create a function and call it before WriteAll operation so WriteAll can write the correct (duplicates filtered) data into new csv file. Help me please!!
I appreciate any answer.
Happy coding!
答案1
得分: 3
这取决于你如何定义“唯一性”,但一般来说,这个问题有几个部分。
什么是唯一的?
- 所有字段必须相等
- 只有某些字段必须相等
- 在比较之前对某些或所有字段进行规范化
你有几种方法可以应用唯一性,包括:
- 你可以使用一个以唯一性的“片段”为键的映射,需要 O(N) 的状态
- 你可以对记录进行排序,并在迭代过程中与前一个记录进行比较,需要 O(1) 的状态,但更复杂
你有两种方法可以进行过滤和输出:
- 你可以使用循环基于旧的切片构建一个新的切片,并一次性写入所有内容,这需要 O(N) 的空间
- 如果不需要排序,你可以在进行过程中将记录写入文件,这需要 O(1) 的空间
我认为一个相对简单和高效的方法是从第一个选择 (1),从第二个选择 (1),从第三个选择 (2),它们一起看起来像这样:
package main
import (
"encoding/csv"
"errors"
"io"
"log"
"os"
)
func main() {
input, err := os.Open("vehicles.csv")
if err != nil {
log.Fatalf("打开输入文件时出错:%s", err)
}
output, err := os.Create("vehicles_filtered.csv")
if err != nil {
log.Fatalf("创建输出文件时出错:%s", err)
}
defer func() {
// 确保在程序结束时关闭文件
if err := output.Close(); err != nil {
log.Fatalf("完成输出文件时出错:%s", err)
}
}()
reader := csv.NewReader(input)
writer := csv.NewWriter(output)
seen := make(map[[3]string]bool)
for {
// 读取一条记录
record, err := reader.Read()
if errors.Is(err, io.EOF) {
break
}
if err != nil {
log.Fatalf("读取记录时出错:%s", err)
}
if len(record) != 3 {
log.Printf("错误的记录 %q", record)
continue
}
// 检查记录是否之前已经出现过,如果是则跳过
key := [3]string{record[0], record[1], record[2]}
if seen[key] {
continue
}
seen[key] = true
// 写入记录
if err := writer.Write(record); err != nil {
log.Fatalf("写入记录 %d 时出错:%s", len(seen), err)
}
}
}
这是一个使用 Go 语言编写的示例代码,用于根据给定的唯一性规则从输入文件中过滤并输出到另一个文件中。你可以根据自己的需求进行修改和调整。
英文:
This depends on how you define "uniqueness", but in general there are a few parts of this problem.
What is unique?
- All fields must be equal
- Only some fields must be equal
- Normalize some or all fields before comparing
You have a few approaches for applying your uniqueness, including:
- You can use a map, keyed by the "pieces" of uniqueness, requires O(N) state
- You can sort the records and compare with the prior record as you iterate, requires O(1) state but is more complicated
You have two approaches for filtering and outputting:
- You can build a new slice based on the old one using a loop and write all at once, this requires O(N) space
- You can write the records out to the file as you go if you don't need to sort, this requires O(1) space
I think a reasonably simple and performant approach would be to pick (1) from the first, (1) from the second, and (2) from the third, which together would look like:
package main
import (
"encoding/csv"
"errors"
"io"
"log"
"os"
)
func main() {
input, err := os.Open("vehicles.csv")
if err != nil {
log.Fatalf("opening input file: %s", err)
}
output, err := os.Create("vehicles_filtered.csv")
if err != nil {
log.Fatalf("creating output file: %s", err)
}
defer func() {
// Ensure the file is closed at the end of the program
if err := output.Close(); err != nil {
log.Fatalf("finalizing output file: %s", err)
}
}()
reader := csv.NewReader(input)
writer := csv.NewWriter(output)
seen := make(map[[3]string]bool)
for {
// Read in one record
record, err := reader.Read()
if errors.Is(err, io.EOF) {
break
}
if err != nil {
log.Fatalf("reading record: %s", err)
}
if len(record) != 3 {
log.Printf("bad record %q", record)
continue
}
// Check if the record has been seen before, skipping if so
key := [3]string{record[0], record[1], record[2]}
if seen[key] {
continue
}
seen[key] = true
// Write the record
if err := writer.Write(record); err != nil {
log.Fatalf("writing record %d: %s", len(seen), err)
}
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论