使用Golang Apache Arrow实现中datatype.go中指定的数据类型来构建模式。

huangapple go评论65阅读模式
英文:

Using the datatypes specified in datatype.go of golang apache arrow implementation for constructing a schema

问题

以下是翻译好的内容:

我正在学习Apache Arrow,并希望了解如何创建模式和Arrow记录。为此,我参考了一些资料,但到目前为止,它们都只使用原始类型来构建模式,就像这样:

schema := arrow.NewSchema(
	[]arrow.Field{
		{Name: "f1-i32", Type: arrow.PrimitiveTypes.Int32},
		{Name: "f2-f64", Type: arrow.PrimitiveTypes.Float64},
	},
	nil,
)

我想使用一些在PrimitiveTypes中不存在的数据类型,例如bool或decimal128。我在浏览Golang Arrow库时发现了一个名为datatype.go的文件,其中包含我想使用的所有可能的数据类型。但是这里的类型不是构建模式时所需的DataType类型。

因此,我有以下三个问题:

  1. 如果可能的话,我如何使用datatype.go中的这些数据类型来构建我的模式?
  2. 如果我想使用十进制类型,我如何指定精度和标度?
  3. 使用扩展类型的示例。
英文:

I am learning apache Arrow and wanted to learn more about how to create a schema and an arrow record. For this I referenced some material but so far all of them just use the primitive types for building a schema like this:`

schema := arrow.NewSchema(
	[]arrow.Field{
		{Name: "f1-i32", Type: arrow.PrimitiveTypes.Int32},
		{Name: "f2-f64", Type: arrow.PrimitiveTypes.Float64},
	},
	nil,
)

There are some datatypes not present in PrimitiveTypes that I want to work with. For example, I want to use bool or decimal128. I was looking through Golang arrow library and came across file datatype.go which has all possible datatypes that I want to use.
But the type here is not of type DataType which is required when constructing the schema.

So, I have the following three questions:

  1. How can I use these datatypes from datatype.go, if possible, for constructing my schema?
  2. How can I specify a precision and scale if I want to use a decimal type?
  3. An example of using extension type.

答案1

得分: 0

这些在datatype.go中定义的数据类型常量已经被用于创建所需的新类型的一部分。其中一些是type Decimal128Type structtype BooleanType struct,如果你检查这些结构体的源代码中的ID方法,你会发现它们返回datatype.go中定义的常量,其名称与结构体的名称相似。而且这些结构体已经实现了DataType接口,这意味着你可以将它们赋值给arrow.Field.Type,因为该字段的类型是DataType

我所指的是:
datatype_fixedwidth.go中,BOOL常量在datatype.go中定义,被用作type BooleanType structID方法的返回值。

func (t *BooleanType) ID() Type { return BOOL }

同样的情况也适用于type Decimal128Type struct

func (*Decimal128Type) ID() Type { return DECIMAL128 }

这些结构体的方法展示了它们实现了DataType接口:

func (*Decimal128Type) BitWidth() int
func (t *Decimal128Type) Fingerprint() string
func (*Decimal128Type) ID() Type
func (*Decimal128Type) Name() string
func (t *Decimal128Type) String() string

这些方法是针对type Decimal128Type struct的。

DataType接口的定义如下:

type DataType interface {
	ID() Type
	Name() string
	Fingerprint() string
}

type BooleanType struct也实现了该接口。

因此,你可以将它们用于以下Type字段的定义:

type Field struct {
	Name     string   // 字段名称
	Type     DataType // 字段的数据类型
	Nullable bool     // 字段可以为空
	Metadata Metadata // 字段的元数据(如果有的话)
}

一个示例:

package main

import (
	"fmt"

	"github.com/apache/arrow/go/arrow"
)

func main() {
    booltype :=  &arrow.BooleanType{}
    decimal128type := &arrow.Decimal128Type{Precision: 1, Scale: 1}

	schema := arrow.NewSchema(
		[]arrow.Field{
			{Name: "f1-bool", Type: booltype},
			{Name: "f2-decimal128", Type: decimal128type},
		},
		nil,
	)

	fmt.Println(schema)
}

输出:

schema:
  fields: 2
    - f1-bool: type=bool
    - f2-decimal128: type=decimal(1, 1)

你可以在文档中找到它们。

还有一些与扩展类型相关的内容,但我对扩展类型不熟悉,所以无法给出示例。但如果你熟悉它,你可以轻松解决它。

英文:

These data type named constants defined in the datatype.go are used already for a part of making new types that you want. Some of them are type Decimal128Type struct and type BooleanType struct if you inspect source code of these structs' ID methods, they return the constant defined in the datatype.go whose name is similar to struct's name. And these structs have already implemented the DataType interface means you can assign them to the arrow.Field.Type because that field's type is DataType.
With they I mean:
The BOOL constant defined in the datatype.go is used as type BooleanType struct's ID method's return value in datatype_fixedwidth.go.
func (t *BooleanType) ID() Type { return BOOL }
Same thing valid for the type Decimal128Type struct too.
func (*Decimal128Type) ID() Type { return DECIMAL128 }.

Methods of one of these structs to show they are implement the DataType interface:

func (*Decimal128Type) BitWidth() int
func (t *Decimal128Type) Fingerprint() string
func (*Decimal128Type) ID() Type
func (*Decimal128Type) Name() string
func (t *Decimal128Type) String() string

Those methods are for type Decimal128Type struct.
And definition of the DataType interface:

type DataType interface {
	ID() Type
	// Name is name of the data type.
	Name() string
	Fingerprint() string
}

type BooleanType struct also implements it.

Hence you can use them for the Type field of:

type Field struct {
	Name     string   // Field name
	Type     DataType // The field's data type
	Nullable bool     // Fields can be nullable
	Metadata Metadata // The field's metadata, if any
}

A demonstrative example:

package main

import (
	"fmt"

	"github.com/apache/arrow/go/arrow"
)

func main() {
    booltype :=  &arrow.BooleanType{}
    decimal128type := &arrow.Decimal128Type{Precision: 1, Scale: 1}

	schema := arrow.NewSchema(
		[]arrow.Field{
			{Name: "f1-bool", Type: booltype},
			{Name: "f2-decimal128", Type: decimal128type},
		},
		nil,
	)

	fmt.Println(schema)
}

Output:

schema:
  fields: 2
    - f1-bool: type=bool
    - f2-decimal128: type=decimal(1, 1)

You can find them in the documentation.
There are also somethings which are related to the extension type.
But I am not familiar with the extension type hence I could not show an example from it. But if you are familiar with it, you can solve it easily.

huangapple
  • 本文由 发表于 2023年6月2日 16:31:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76388469.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定