无法使用gocsv读取带引号的字段。

huangapple go评论74阅读模式
英文:

can't read quoted field with gocsv

问题

我有一个来自我无法控制的端点的 CSV 响应,我无法解析它的响应,因为它包含引号。它的格式大致如下:

name,id,quantity,"status (active, expired)"
John,14,4,active 
Bob,12,7,expired

为了解析这个响应,我创建了以下结构体:

type UserInfo struct {
    Name     string `csv:"name"`
    ID       string `csv:"id"`
    Quantity string `csv:"quantity"`
    Status   string `csv:"status (active, expired)"`
}

我尝试使用以下方式:

Status   string `csv:"\"status (active, expired)\""`
Status   string `csv:'"status (active, expired)"'`

但是似乎都没有帮助,当我使用 gocsv.Unmarshal 时,我无法访问 Status 字段。

var actualResult []UserInfo
err = gocsv.Unmarshal(in, &actualResult)

for _, elem := range actualResult {
    fmt.Println(elem.Status)
}

我得不到任何响应。

这里有一个示例:https://go.dev/play/p/lje1zNO9w6E

英文:

I have a csv response that comes from an endpoint that I don't control and I'm failing to parse its response because it has quotes. It looks something like this:

name,id,quantity,"status (active, expired)"
John,14,4,active 
Bob,12,7,expired

to parse this response I have created the following struct:

type UserInfo struct {
Name     string `csv:"name"`
ID       string `csv:"id"`
Quantity string `csv:"quantity"`
Status   string `csv:"status (active, expired)"`
}

I have tried using

Status   string `csv:""status (active, expired)""`
Status   string `csv:'"status (active, expired)"'`

but none seem to be helpful, I just can't access the field Status when I use gocsv.Unmarshal.

var actualResult []UserInfo
err = gocsv.Unmarshal(in, &actualResult)

for _, elem := range actualResult {
	fmt.Println(elem.Status)
	}

And I get nothing as as response.

https://go.dev/play/p/lje1zNO9w6E here's an example

答案1

得分: 1

你不需要像gocsv这样的第三方包(除非你有特定的用例),因为使用Go的内置encoding/csv可以很容易地完成。

你只需要忽略第一行/记录,即终端点响应中的csv header

csvReader := csv.NewReader(strings.NewReader(csvString))

records, err := csvReader.ReadAll()
if err != nil {
    panic(err)
}

var users []UserInfo

// 遍历所有记录,不包括第一条记录,即头部
for _, record := range records[1:] {
    users = append(users, UserInfo{Name: record[0], ID: record[1], Quantity: record[2], Status: record[3]})
}

fmt.Printf("%v", users)
// 输出: [{ John 14 4 active } { Bob 12 7 expired}]

这是基于你的用例和示例字符串的Go Playground上的工作示例

英文:

You don't need third party package like gocsv (unless you have specific usecase) when it can be done easily with Go's builtin encoding/csv.

You just have to ignore first line/record which is csv header in your endpoint's response.

csvReader := csv.NewReader(strings.NewReader(csvString))

records, err := csvReader.ReadAll()
if err != nil {
	panic(err)
}

var users []UserInfo

// Iterate over all records excluding first one i.e., header
for _, record := range records[1:] {
	users = append(users, UserInfo{Name: record[0], ID: record[1], Quantity: record[2], Status: record[3]})
}

fmt.Printf("%v", users)
// Output: [{ John 14 4 active } { Bob 12 7 expired}]

Here is working example on Go Playground based on your use case and sample string.

答案2

得分: 1

我只是不认为gocarina/gocsv可以解析带引号逗号的标题。我在文档中没有看到明确说明它不能解析,但我进行了一些调查,发现有明确的示例在“CSV注释”中使用了逗号,并且看起来作者只考虑了注释中的逗号用于包/API的目的,而不是作为列名的一部分。

如果我们查看包中的sample_structs_test.go,我们可以看到逗号在以下几种方式中被使用:

  • 在元数据指令中,比如“omitempty”:

    type Sample struct {
    	Foo  string  `csv:"foo"`
    	Bar  int     `csv:"BAR"`
    	Baz  string  `csv:"Baz"`
    	...
    	Omit *string `csv:"Omit,omitempty"`
    }
    
  • 用于声明结构体中的字段可以从多个不同的标题中填充:

    type MultiTagSample struct {
    	Foo string `csv:"Baz,foo"`
    	Bar int    `csv:"BAR"`
    }
    

    你可以在这里看到它的实际效果,这里

顺便说一句,官方的encoding/json包也有同样的限制,并且他们指出了这一点(强调添加):

> 每个结构字段的编码可以通过存储在结构字段标签下的“json”键的格式字符串进行自定义。格式字符串给出了字段的名称,可能后跟一个逗号分隔的选项列表。如果名称为空,则可以指定选项而不覆盖默认字段名称。

> 如果键名是一个非空字符串,只包含Unicode字母、数字和ASCII标点符号,除了引号、反斜杠和逗号,则将使用该键名。

所以,你可能无法得到你期望/想要的结果:抱歉,这可能只是在注释你的结构体时的一个限制。如果你愿意,你可以向gocarina/gocsv提交一个错误报告。

与此同时,你可以在标题进入时修改它。这个示例非常巧妙,但它可以工作:它只是将“status (active, expired)”替换为“status (active expired)”,并使用没有逗号的版本来注释结构体。

endpointReader := strings.NewReader(sCSV)

// 修复标题
var bTmp bytes.Buffer
fixer := bufio.NewReader(endpointReader)
header, _ := fixer.ReadString('\n')
header = strings.Replace(header, "\"status (active, expired)\"", "status (active expired)", -1)
bTmp.Write([]byte(header))
// 读取剩下的CSV
bTmp.ReadFrom(fixer)

// 转换回reader
reader := bytes.NewReader(bTmp.Bytes())

var actualResult []UserInfo
...

我可以运行它,现在得到的结果是:

active 
expired
英文:

I simply don't think gocarina/gocsv can parse a header with a quoted comma. I don't see it spelled out anywhere in the documentation that it cannot, but I did some digging and there are clear examples of commas being used in the "CSV annotations", and it looks like the author only conceived of commas in the annotations being used for the purposes of the package/API, and not as part of the column name.

If we look at sample_structs_test.go from the package, we can see commas being used in some of the following ways:

  • in metadata directives, like "omitempty":

    type Sample struct {
    	Foo  string  `csv:"foo"`
    	Bar  int     `csv:"BAR"`
    	Baz  string  `csv:"Baz"`
    	...
    	Omit *string `csv:"Omit,omitempty"`
    }
    
  • for declaring that a field in the struct can be populated from multiple, different headers:

    type MultiTagSample struct {
    	Foo string `csv:"Baz,foo"`
    	Bar int    `csv:"BAR"`
    }
    

    You can see this in action, here.

FWIW, the official encoding/json package has the same limitation, and they note it (emphasis added):

> The encoding of each struct field can be customized by the format string stored under the "json" key in the struct field's tag. The format string gives the name of the field, possibly followed by a comma-separated list of options. The name may be empty in order to specify options without overriding the default field name.

and

> The key name will be used if it's a non-empty string consisting of only Unicode letters, digits, and ASCII punctuation except quotation marks, backslash, and comma.

So, you may not be able to get what you expect/want: sorry, this may just be a limitation of having the ability to annotate your structs. If you want, you could file a bug with gocarina/gocsv.

In the meantime, you can just modify the header as it's coming in. This is example is pretty hacky, but it works: it just replaces "status (active, expired)" with "status (active expired)" and uses the comma-less version to annotate the struct.

endpointReader := strings.NewReader(sCSV)

// Fix header
var bTmp bytes.Buffer
fixer := bufio.NewReader(endpointReader)
header, _ := fixer.ReadString('\n')
header = strings.Replace(header, "\"status (active, expired)\"", "status (active expired)", -1)
bTmp.Write([]byte(header))
// Read rest of CSV
bTmp.ReadFrom(fixer)

// Turn back into a reader
reader := bytes.NewReader(bTmp.Bytes())

var actualResult []UserInfo
...

I can run that and now get:

active 
expired

huangapple
  • 本文由 发表于 2023年1月10日 01:16:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75060814.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定