如何在Golang中过滤[][]string切片的元素?

huangapple go评论68阅读模式
英文:

How to filter elements of a [][]string slice in Golang?

问题

首先,你的代码看起来很不错!要过滤重复项,你可以使用一个 map 来存储已经出现过的记录。以下是一个示例函数,可以帮助你过滤重复项:

func filterDuplicates(records [][]string) [][]string {
    seen := make(map[string]bool)
    result := [][]string{}

    for _, record := range records {
        key := strings.Join(record, ",")
        if !seen[key] {
            seen[key] = true
            result = append(result, record)
        }
    }

    return result
}

你可以在 WriteAll 操作之前调用这个函数,将过滤后的数据传递给 WriteAll。修改你的代码如下:

func main() {
    // 打开 csv 文件
    recordFile, err := os.Open("vehicles.csv")
    if err != nil {
        fmt.Println("遇到错误:", err)
        return
    }
    defer recordFile.Close()

    // 读取数据
    reader := csv.NewReader(recordFile)
    vehicles, err := reader.ReadAll()
    if err != nil {
        fmt.Println("遇到错误:", err)
        return
    }

    // 过滤重复项
    filteredVehicles := filterDuplicates(vehicles)

    // 创建新的 csv 文件
    newRecordFile, err := os.Create("newCsvFile.csv")
    if err != nil {
        fmt.Println("遇到错误:", err)
        return
    }
    defer newRecordFile.Close()

    // 写入数据到新的 csv 文件
    writer := csv.NewWriter(newRecordFile)
    err = writer.WriteAll(filteredVehicles)
    if err != nil {
        fmt.Println("遇到错误:", err)
        return
    }

    writer.Flush()
}

这样,newCsvFile.csv 中的数据将是过滤后的数据,不包含重复项。希望对你有帮助!祝你编码愉快!

英文:

First of all i'm new here and i'm trying to learn Golang. I would like to check my csv file (which has 3 values; type, maker, model) and create a new one and after a filter operation i want to write new data(filtered) to the created csv file. Here is my code so you can understand me more clearly.

package main

import (
	"encoding/csv"
	"fmt"
	"os"
)

func main() {
	//openning my csv file which is vehicles.csv
	recordFile, err := os.Open("vehicles.csv")
	if err != nil{
		fmt.Println("An error encountered ::", err)
	}
	//reading it
	reader := csv.NewReader(recordFile)
	vehicles, _ := reader.ReadAll()
	//creating a new csv file
	newRecordFile, err := os.Create("newCsvFile.csv")
	if err != nil{
		fmt.Println("An error encountered ::", err)
	}
	//writing vehicles.csv into the new csv
	writer := csv.NewWriter(newRecordFile)
	err = writer.WriteAll(vehicles)
	if err != nil {
		fmt.Println("An error encountered ::", err)
	}
}

After i build it, it is working this way. It reads and writes the all data to new created csv file. But the problem here is, i want to filter duplicates of readed csv which is vehicles, i am creating another function (outside of the main function) to filter duplicates but i can't do it because vehicles 's type is [][]string, i searched the internet about filtering duplicates but all i found is int or string types. What i want to do is create a function and call it before WriteAll operation so WriteAll can write the correct (duplicates filtered) data into new csv file. Help me please!!
I appreciate any answer.
Happy coding!

答案1

得分: 3

这取决于你如何定义“唯一性”,但一般来说,这个问题有几个部分。

什么是唯一的?

  1. 所有字段必须相等
  2. 只有某些字段必须相等
  3. 在比较之前对某些或所有字段进行规范化

你有几种方法可以应用唯一性,包括:

  1. 你可以使用一个以唯一性的“片段”为键的映射,需要 O(N) 的状态
  2. 你可以对记录进行排序,并在迭代过程中与前一个记录进行比较,需要 O(1) 的状态,但更复杂

你有两种方法可以进行过滤和输出:

  1. 你可以使用循环基于旧的切片构建一个新的切片,并一次性写入所有内容,这需要 O(N) 的空间
  2. 如果不需要排序,你可以在进行过程中将记录写入文件,这需要 O(1) 的空间

我认为一个相对简单和高效的方法是从第一个选择 (1),从第二个选择 (1),从第三个选择 (2),它们一起看起来像这样:

package main

import (
	"encoding/csv"
	"errors"
	"io"
	"log"
	"os"
)

func main() {
	input, err := os.Open("vehicles.csv")
	if err != nil {
		log.Fatalf("打开输入文件时出错:%s", err)
	}

	output, err := os.Create("vehicles_filtered.csv")
	if err != nil {
		log.Fatalf("创建输出文件时出错:%s", err)
	}
	defer func() {
		// 确保在程序结束时关闭文件
		if err := output.Close(); err != nil {
			log.Fatalf("完成输出文件时出错:%s", err)
		}
	}()

	reader := csv.NewReader(input)
	writer := csv.NewWriter(output)

	seen := make(map[[3]string]bool)
	for {
		// 读取一条记录
		record, err := reader.Read()
		if errors.Is(err, io.EOF) {
			break
		}
		if err != nil {
			log.Fatalf("读取记录时出错:%s", err)
		}
		if len(record) != 3 {
			log.Printf("错误的记录 %q", record)
			continue
		}

		// 检查记录是否之前已经出现过,如果是则跳过
		key := [3]string{record[0], record[1], record[2]}
		if seen[key] {
			continue
		}
		seen[key] = true

		// 写入记录
		if err := writer.Write(record); err != nil {
			log.Fatalf("写入记录 %d 时出错:%s", len(seen), err)
		}
	}
}

这是一个使用 Go 语言编写的示例代码,用于根据给定的唯一性规则从输入文件中过滤并输出到另一个文件中。你可以根据自己的需求进行修改和调整。

英文:

This depends on how you define "uniqueness", but in general there are a few parts of this problem.

What is unique?

  1. All fields must be equal
  2. Only some fields must be equal
  3. Normalize some or all fields before comparing

You have a few approaches for applying your uniqueness, including:

  1. You can use a map, keyed by the "pieces" of uniqueness, requires O(N) state
  2. You can sort the records and compare with the prior record as you iterate, requires O(1) state but is more complicated

You have two approaches for filtering and outputting:

  1. You can build a new slice based on the old one using a loop and write all at once, this requires O(N) space
  2. You can write the records out to the file as you go if you don't need to sort, this requires O(1) space

I think a reasonably simple and performant approach would be to pick (1) from the first, (1) from the second, and (2) from the third, which together would look like:

package main

import (
	"encoding/csv"
	"errors"
	"io"
	"log"
	"os"
)

func main() {
	input, err := os.Open("vehicles.csv")
	if err != nil {
		log.Fatalf("opening input file: %s", err)
	}

	output, err := os.Create("vehicles_filtered.csv")
	if err != nil {
		log.Fatalf("creating output file: %s", err)
	}
	defer func() {
		// Ensure the file is closed at the end of the program
		if err := output.Close(); err != nil {
			log.Fatalf("finalizing output file: %s", err)
		}
	}()

	reader := csv.NewReader(input)
	writer := csv.NewWriter(output)

	seen := make(map[[3]string]bool)
	for {
		// Read in one record
		record, err := reader.Read()
		if errors.Is(err, io.EOF) {
			break
		}
		if err != nil {
			log.Fatalf("reading record: %s", err)
		}
		if len(record) != 3 {
			log.Printf("bad record %q", record)
			continue
		}

		// Check if the record has been seen before, skipping if so
		key := [3]string{record[0], record[1], record[2]}
		if seen[key] {
			continue
		}
		seen[key] = true

		// Write the record
		if err := writer.Write(record); err != nil {
			log.Fatalf("writing record %d: %s", len(seen), err)
		}
	}
}

huangapple
  • 本文由 发表于 2021年7月12日 05:50:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/68340145.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定