英文:
How to handle NaN values when writing to parquet in GO?
问题
我正在尝试在GO中写入parquet文件。在写入该文件时,我可能会遇到NaN
值。由于NaN
既不在原始类型中定义,也不在逻辑类型中定义,那么我该如何在GO中处理这个值?是否有任何现有的模式可以处理它?
我正在使用来自这里的parquet GO库。您可以在这里找到使用JSON模式写入parquet的代码示例,使用的是这个库。
英文:
I am trying to write to a parquet file in GO. While writing to this file, I can get NaN
values. Since NaN
is neither defined in the primitive types nor in logical type then how do I handle this value in GO? Does any existing schema work for it?
I am using the parquet GO library from here. You can find an example of the code using JSON schema for writing to parquet here using this library.
答案1
得分: 0
这个问题在xitongsys/parquet-go
issue 281中进行了详细讨论,建议使用OPTIONAL
类型。即使你不给它赋值(就像你的代码一样),非指针值也会被赋予默认值。所以parquet-go
不知道它是null还是默认值。
然而:
问题在于我不能使用OPTIONAL
类型,换句话说,我不能将我的结构体转换为使用指针。我尝试使用repetitiontype=OPTIONAL
作为标签,但这会导致一些奇怪的行为。我希望这个标签的行为方式与Golang标准库中的omitempty
标签相同,即如果值不存在,则不会放入JSON中。
这是一个说明该问题的示例:
package main
import (
"encoding/json"
"io/ioutil"
)
type Salary struct {
Basic, HRA, TA float64 `json:",omitempty"`
}
type Employee struct {
FirstName, LastName, Email string `json:",omitempty"`
Age int
MonthlySalary []Salary `json:",omitempty"`
}
func main() {
data := Employee{
Email: "mark@gmail.com",
MonthlySalary: []Salary{
{
Basic: 15000.00,
},
},
}
file, _ := json.MarshalIndent(data, "", " ")
_ = ioutil.WriteFile("test.json", file, 0o644)
}
生成的JSON如下所示:
{
"Email": "mark@gmail.com",
"Age": 0,
"MonthlySalary": [
{
"Basic": 15000
}
]
}
正如你所看到的,结构体中具有omitempty
标签且未赋值的项不会出现在JSON中,即HRA
和TA
。但是,Age
没有这个标签,因此它仍然包含在JSON中。
这是一个问题,因为当这个Golang库写入parquet
时,结构体中的所有字段都被分配了内存,所以如果你有一个只有很少填充的大结构体,它仍然会占用完整的内存。当再次读取文件时,这是一个更大的问题,因为无法知道放入parquet文件的值是空值还是未赋值。
如果我能说服你它的价值,我很乐意帮助实现这个库的omitempty
标签。
这与issue 403 "No option to omitempty when not using pointers"相呼应。
英文:
The isse was discussed at lenght in xitongsys/parquet-go
issue 281, with the recommandation being to
> use OPTIONAL
type.
Even you don't assign a value (like you code), the non-point value will be assigned a default value.
So parquet-go
don't know it's null or default value.
However:
> What is comes down to is that I cannot use the OPTIONAL
type, in other words I cannot convert my structure to use pointers.
I have tried to use repetitiontype=OPTIONAL
as a tag, but this leads to some weird behavior.
I would expect that tag to behave the same way that the omitempty
tag in the Golang standard library, i.e. if the value is not present then it is not put into the JSON.
>
> The reason this is important is that if the field is missing or not set, when it is encoded to parquet then there is no way of telling if the value was 0 or just not set in the case of int64.
This illustrates the issue:
package main
import (
"encoding/json"
"io/ioutil"
)
type Salary struct {
Basic, HRA, TA float64 `json:",omitempty"`
}
type Employee struct {
FirstName, LastName, Email string `json:",omitempty"`
Age int
MonthlySalary []Salary `json:",omitempty"`
}
func main() {
data := Employee{
Email: "mark@gmail.com",
MonthlySalary: []Salary{
{
Basic: 15000.00,
},
},
}
file, _ := json.MarshalIndent(data, "", " ")
_ = ioutil.WriteFile("test.json", file, 0o644)
}
with a JSON produced as:
{
"Email": "mark@gmail.com",
"Age": 0,
"MonthlySalary": [
{
"Basic": 15000
}
]
}
> As you can see, the item in the struct that have the omit empty
tag and that are not assigned do no appear in the JSON, i.e. HRA TA
.
But on the other hand Age
does not have this tag and hence it is still included in the JSON.
>
> This is problematic as all fields in the struct are assigned memory when this golang library writes to parquet-
so if you have a big struct that is only sparsely populated it will still take the full amount of memory.
It is a bigger problem when the file is read again as there is no way of know if the value that was put in the parquet file was the empty value or it is was just not assigned.
>
> I am happy to help implement an omitempty
tag for this library if I can convince you of the value of having it.
That echoes issue 403 "No option to omitempty when not using pointers".
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论