How to reorder a CSV file to group by contents of a particular column

huangapple go评论86阅读模式
英文:

How to reorder a CSV file to group by contents of a particular column

问题

我非常新手Golang,我的问题也不清楚,但这就是我想要实现的。我有一个如下的csv文件,我主要想重新排列/排序最后一列(状态=通过、失败/跳过):

test,test-cat,skipped
test,test-cat,failed
test,test-cat,passed
test,test-cat,skipped
test,test-cat,passed
test,test-cat,failed

期望的结果是,如果最后一列具有相同的状态,则将它们分组在一起。

test,test-cat,skipped
test,test-cat,skipped
test,test-cat,failed
test,test-cat,failed
test,test-cat,passed
test,test-cat,passed

我写了以下代码,它看起来不太好:-),但它按照我想要的方式工作。

package main
import (
        "bufio"
        "fmt"
        "os"
        "strings"
)
func main() {
        var FailStat, SkipStat,PassStat []string
      
        file, err := os.Open("test.csv")

        if err != nil {
                fmt.Println(err)
        } else {
                scanner := bufio.NewScanner(file)
                for scanner.Scan() {
                        line := scanner.Text()
                        if strings.Contains(line, "failed") {
                                FailStat = append(FailStat, line)

                        }
                        if strings.Contains(line, "skipped") {
                                SkipStat = append(SkipStat, line)

                        }
                        if strings.Contains(line, "passed") {
                                PassStat = append(PassStat, line)

                        }                       
                }
        }
        file.Close()

        var finalstat []string
        finalstat = append(SkipStat, FailStat...)
        finalstat = append(finalstat, PassStat...)

        for _, line := range finalstat {
           fmt.Println(line)
   }
}

测试运行:

$ ./readfile 
test,test-cat,skipped
test,test-cat,skipped
test,test-cat,failed
test,test-cat,failed
test,test-cat,passed
test,test-cat,passed

肯定有更好的方法,请给予建议。对于新手问题,我很抱歉!

英文:

I am very new go Golang and my question is not cleared also, but this is what I am trying to achieve.
I have a csv file as follow, as I am mainly trying to re-arrange/sort last column(status=passed,failed/skipped)

test,test-cat,skipped
test,test-cat,failed
test,test-cat,passed
test,test-cat,skipped
test,test-cat,passed
test,test-cat,failed

Expecting last column to be grouped them together if it has same status.

test,test-cat,skipped
test,test-cat,skipped
test,test-cat,failed
test,test-cat,failed
test,test-cat,passed
test,test-cat,passed

With this codes I did, it does not look good:-) but it works as I wanted.

package main
import (
        "bufio"
        "fmt"
        "os"
        "strings"
)
func main() {
        var FailStat, SkipStat,PassStat []string
      
        file, err := os.Open("test.csv")

        if err != nil {
                fmt.Println(err)
        } else {
                scanner := bufio.NewScanner(file)
                for scanner.Scan() {
                        line := scanner.Text()
                        if strings.Contains(line, "failed") {
                                FailStat = append(FailStat, line)

                        }
                        if strings.Contains(line, "skipped") {
                                SkipStat = append(SkipStat, line)

                        }
                        if strings.Contains(line, "passed") {
                                PassStat = append(PassStat, line)

                        }                       
                }
        }
        file.Close()

        var finalstat []string
        finalstat = append(SkipStat, FailStat...)
        finalstat = append(finalstat, PassStat...)

        for _, line := range finalstat {
           fmt.Println(line)
   }
}

Test-Run:

$ ./readfile 
test,test-cat,skipped
test,test-cat,skipped
test,test-cat,failed
test,test-cat,failed
test,test-cat,passed
test,test-cat,passed

There must be a many better ways, please advice. Sorry for newbie question!

答案1

得分: 2

Inian的解决方案在状态分组的顺序不重要时可以工作(由于map的设计,你不应该期望每次运行时都能得到相同的分组顺序)。

如果你需要按照一致的顺序对分组进行排序:

package main

import (
	"encoding/csv"
	"io"
	"log"
	"os"
	"sort"
	"strings"
)

type Row struct {
	Name, Category, Status string
}

func main() {
	in := `test,test-cat,skipped
test,test-cat,failed
test,test-cat,passed
test,test-cat,skipped
test,test-cat,passed
test,test-cat,failed
`
	r := csv.NewReader(strings.NewReader(in))

	rows := make([]Row, 0)
	for {
		record, err := r.Read()
		if err == io.EOF {
			break
		}
		if err != nil {
			log.Fatal(err)
		}
		row := Row{record[0], record[1], record[2]}
		rows = append(rows, row)
	}

	sort.Slice(rows, func(i, j int) bool { return rows[i].Status < rows[j].Status })

	w := csv.NewWriter(os.Stdout)

	for _, row := range rows {
		w.Write([]string{row.Name, row.Category, row.Status})
	}
	w.Flush()

	if err := w.Error(); err != nil {
		log.Fatal(err)
	}
}

我们得到的结果是:

test,test-cat,failed
test,test-cat,failed
test,test-cat,passed
test,test-cat,passed
test,test-cat,skipped
test,test-cat,skipped

将sort.Slice中的<更改为>可以反转排序的顺序。

Go Playground

如果你不想改动Row结构并在[]Row和[][]string之间进行转换:

// ...
rows := make([][]string, 0)
for {
	row, err := r.Read()
    // ...
	rows = append(rows, row)
}

sort.Slice(rows, func(i, j int) bool { return rows[i][2] < rows[j][2] })

w := csv.NewWriter(os.Stdout)

for _, row := range rows {
	w.Write(row)
}
// ...

Go Playground

在你的评论中,你提到想要特定分组的顺序,现在我可以在你的原始代码中看到你的目标 😊

在这种情况下,Inian的解决方案是正确的方向:

    // ...

	recordGroups := make(map[string][][]string)
	for {
		records, err := r.Read()
		if err == io.EOF {
			break
		}
		if err != nil {
			log.Fatal(err)
		}
		groupName := records[2]
		recordGroups[groupName] = append(recordGroups[groupName], records)
	}
	w := csv.NewWriter(os.Stdout)

    // 使用这个分组名称的切片来控制顺序
	groupNames := []string{"failed", "passed", "skipped", "Bogus group!"}

	for _, groupName := range groupNames {
		recordGroup, ok := recordGroups[groupName]
		if !ok {
			log.Printf("did not find expected group %q\n", groupName)
			continue
		}
		for _, record := range recordGroup {
			if err := w.Write(record); err != nil {
				log.Fatalln("error writing record to csv:", err)
			}
		}
	}

    // ...
2009/11/10 23:00:00 did not find expected group "Bogus group!"
test,test-cat,failed
test,test-cat,failed
test,test-cat,skipped
test,test-cat,skipped
test,test-cat,passed
test,test-cat,passed

Go Playground

英文:

Inian's solution will work if the order of the status groupings doesn't matter (because of map's design, you should never expect to get the same ordering of the groups from run to run).

If you need the groups consistently ordered, that is actually sorted:

package main

import (
	&quot;encoding/csv&quot;
	&quot;io&quot;
	&quot;log&quot;
	&quot;os&quot;
	&quot;sort&quot;
	&quot;strings&quot;
)

type Row struct {
	Name, Category, Status string
}

func main() {
	in := `test,test-cat,skipped
test,test-cat,failed
test,test-cat,passed
test,test-cat,skipped
test,test-cat,passed
test,test-cat,failed
`
	r := csv.NewReader(strings.NewReader(in))

	rows := make([]Row, 0)
	for {
		record, err := r.Read()
		if err == io.EOF {
			break
		}
		if err != nil {
			log.Fatal(err)
		}
		row := Row{record[0], record[1], record[2]}
		rows = append(rows, row)
	}

	sort.Slice(rows, func(i, j int) bool { return rows[i].Status &lt; rows[j].Status })

	w := csv.NewWriter(os.Stdout)

	for _, row := range rows {
		w.Write([]string{row.Name, row.Category, row.Status})
	}
	w.Flush()

	if err := w.Error(); err != nil {
		log.Fatal(err)
	}
}

and we get:

test,test-cat,failed
test,test-cat,failed
test,test-cat,passed
test,test-cat,passed
test,test-cat,skipped
test,test-cat,skipped

Change the < to > in the anonymous func for sort.Slice to reverse the order of the sort.

Go Playground

If you don't want to mess with the Row struct and convert between []Row and [][]string:

// ...
rows := make([][]string, 0)
for {
	row, err := r.Read()
    // ...
	rows = append(rows, row)
}

sort.Slice(rows, func(i, j int) bool { return rows[i][2] &lt; rows[j][2] })

w := csv.NewWriter(os.Stdout)

for _, row := range rows {
	w.Write(row)
}
// ...

Go Playground

In a comment you mentioned wanting a specific order of the groups, and now I can see in your original code what you were aiming for 🙂

In which case Ianian's solution is going the right direction:

    // ...

	recordGroups := make(map[string][][]string)
	for {
		records, err := r.Read()
		if err == io.EOF {
			break
		}
		if err != nil {
			log.Fatal(err)
		}
		groupName := records[2]
		recordGroups[groupName] = append(recordGroups[groupName], records)
	}
	w := csv.NewWriter(os.Stdout)

    // Control the order with this slice of group names
	groupNames := []string{&quot;failed&quot;, &quot;passed&quot;, &quot;skipped&quot;, &quot;Bogus group!&quot;}

	for _, groupName := range groupNames {
		recordGroup, ok := recordGroups[groupName]
		if !ok {
			log.Printf(&quot;did not find expected group %q\n&quot;, groupName)
			continue
		}
		for _, record := range recordGroup {
			if err := w.Write(record); err != nil {
				log.Fatalln(&quot;error writing record to csv:&quot;, err)
			}
		}
	}

    // ...
2009/11/10 23:00:00 did not find expected group &quot;Bogus group!&quot;
test,test-cat,failed
test,test-cat,failed
test,test-cat,skipped
test,test-cat,skipped
test,test-cat,passed
test,test-cat,passed

Go Playground

答案2

得分: 1

这个目的最好使用标准库中提供的 csv 包。逻辑涉及创建一个字符串到字符串切片的映射,其中键将是你想要分组的列,值将是唯一于该列的行的列表。

一旦填充了映射,接下来的操作将是以 CSV 格式打印结果。下面的示例涉及从变量中读取输入并打印回标准输出。你可以参考该包中的其他方法来在文本文件上执行相同的操作。

package main

import (
	"encoding/csv"
	"io"
	"log"
	"os"
	"strings"
)

func main() {
	in := `test,test-cat,skipped
test,test-cat,failed
test,test-cat,passed
test,test-cat,skipped
test,test-cat,passed
test,test-cat,failed
`
	r := csv.NewReader(strings.NewReader(in))
	dictMap := make(map[string][][]string)
	for {
		records, err := r.Read()
		if err == io.EOF {
			break
		}
		if err != nil {
			log.Fatal(err)
		}
		dictMap[records[2]] = append(dictMap[records[2]], records)
	}

	w := csv.NewWriter(os.Stdout)

	for _, records := range dictMap {
		for idx := range records {
			if err := w.Write(records[idx]); err != nil {
				log.Fatalln("error writing record to csv:", err)
			}
		}
	}

	w.Flush()

	if err := w.Error(); err != nil {
		log.Fatal(err)
	}
}

go-playground

英文:

It is better of to the csv package provided in the standard library for this purpose. The logic involves creating a map of string to a slice of strings, where the key will be the column you want to group on and the value being the list of rows that are unique to it.

Once you populate the map, the subsequent action would be to print the result back in CSV format. The below example involves reading the input from a variable and printing back to stdout. You can refer to the other methods in the package to perform the same on a text file.

package main

import (
	&quot;encoding/csv&quot;
	&quot;io&quot;
	&quot;log&quot;
	&quot;os&quot;
	&quot;strings&quot;
)

func main() {
	in := `test,test-cat,skipped
test,test-cat,failed
test,test-cat,passed
test,test-cat,skipped
test,test-cat,passed
test,test-cat,failed
`
	r := csv.NewReader(strings.NewReader(in))
	dictMap := make(map[string][][]string)
	for {
		records, err := r.Read()
		if err == io.EOF {
			break
		}
		if err != nil {
			log.Fatal(err)
		}
		dictMap[records[2]] = append(dictMap[records[2]], records)
	}

	w := csv.NewWriter(os.Stdout)

	for _, records := range dictMap {
		for idx := range records {
			if err := w.Write(records[idx]); err != nil {
				log.Fatalln(&quot;error writing record to csv:&quot;, err)
			}
		}
	}

	w.Flush()

	if err := w.Error(); err != nil {
		log.Fatal(err)
	}
}

go-playground

huangapple
  • 本文由 发表于 2022年7月30日 03:10:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/73170034.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定