为什么使用数组而不是切片?

huangapple go评论92阅读模式
英文:

Why use arrays instead of slices?

问题

我一直在研究Go语言,并且在思考这个基本问题时遇到了困惑。

在Go语言中,很明显切片更加灵活,通常可以在需要数据序列时代替数组使用。

阅读大部分文档后,他们似乎鼓励开发者只使用切片而不是数组。我得到的印象是,创造者本可以设计语言只有可调整大小的切片而没有数组。事实上,这样的设计会使语言更容易理解,甚至可能鼓励更多符合惯用法的代码。

那么为什么创造者一开始允许存在数组呢?在什么情况下会使用数组而不是切片?是否存在某种情况下使用数组而不是切片会更有说服力?

当我查阅官方文档(http://golang.org/doc/effective_go.html#arrays)时,我找到的唯一有用的部分是:

> 数组在规划内存布局和有时可以帮助避免分配时很有用,但主要是切片的构建块。

他们接着讨论了数组作为值时的开销,以及如何使用指针模拟C语言的行为。即便如此,他们在数组部分结束时明确推荐:
> 但即使这种风格也不是符合惯用的Go语言。请使用切片代替。

那么,有哪些实际的例子可以说明切片不适用于“规划内存布局”或“帮助避免分配”的情况呢?

英文:

I have been reading up on Go, and got stumped thinking about this fundamental question.

In Go, it is quite clear that slices are more flexible, and can generally be used in place of arrays when you need a sequence of data.

Reading most of the documentation, they seem to be encouraging developers to just use slices instead of arrays. The impression I get feels like the creators could have simply designed the language to have only resize-able slices and no arrays. In fact, such a design would have made the language easier to understand, and perhaps even encouraged more idiomatic code.

So why did the creators allow arrays in the first place? When would arrays ever be used instead of slices? Is there ever a situation where the use of arrays over slices will be compelling?

When I consulted the official documentation (http://golang.org/doc/effective_go.html#arrays), the only useful part I found was:

> Arrays are useful when planning the detailed layout of memory and
> sometimes can help avoid allocation, but primarily they are a building block
> for slices.

They went on to talk about how arrays are expensive as values, and how to simulate C-style behavior with pointer. Even then, they ended the array section with a clear recommendation:
> But even this style isn't idiomatic Go. Use slices instead.

So, what are some real examples of "planning the detailed layout of memory" or "help avoid allocation" that slices would be unsuited for?

答案1

得分: 28

如Akavall所说,数组是可哈希的。这意味着它们可以用作映射的键。

它们也是按值传递的。每次将其传递给函数或将其赋值给另一个变量时,它都会完全复制一份。

它们可以通过编码/二进制进行序列化。

它们还可以用于控制内存布局。由于它不是引用,当它被放置在结构体中时,它将分配与结构体的一部分相同大小的内存,而不是像切片那样在那里放置一个指针的等效物。

总之,除非你知道自己在做什么,否则不要使用数组。

> 可哈希/可序列化都是很好的特性,但我不确定它们是否确实有这么大的吸引力

如果你想要一个md5哈希的映射,你会怎么做?不能使用字节切片,所以你需要做类似这样的操作来绕过类型系统:

// 16字节
type hashableMd5 struct {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p byte}

然后为它创建一个序列化函数。可哈希的数组意味着你可以将其称为[16]byte。

> 听起来越来越接近C的malloc,sizeof了

不,这与malloc或sizeof没有任何关系。它们用于分配内存和获取变量的大小。

然而,CGo是另一种使用情况。cgo命令创建的类型具有与其对应的C类型相同的内存布局。为了做到这一点,它有时需要插入无名数组来进行填充。

> 如果问题可以通过使用切片来解决...而且没有明显的性能损失...

数组还可以避免间接操作,从而使某些类型的代码更快。当然,这只是一种微小的优化,在几乎所有情况下都是微不足道的。

英文:

As said by Akavall, arrays are hashable. That means they can be used as a key to a map.

They are also pass by value. Each time you pass it to a function or assign it to another variable it makes a complete copy of it.

They can be serialized by encoding/binary.

They also can be used to control memory layout. Since it is not a reference, when it is placed in a struct, it will allocate that much memory as part of the struct instead of putting the equivalent of a pointer there like a slice would.

Bottom line, don't use an array unless you know what you are doing.


> Hashable/serializable are all nice to have, but I'm just not sure if they are indeed that compelling to have

What would you do if you wanted to have a map of md5 hashes? Can't use a byte slice so you would need to do something like this to get around the type system:

// 16 bytes
type hashableMd5 struct {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p byte}

Then create a serialization function for it. Hashable arrays mean that you can just call it a [16]byte.

> Sounds like getting closer to C's malloc, sizeof

Nope, that has nothing to do with malloc or sizeof. Those are to allocate memory and get the size of a variable.

However, CGo is another use case for this. The cgo command creates types that have the same memory layout as their corresponding C types. To do this, it sometimes needs to insert unnamed arrays for padding.

> If problems can be solved with ... nil/insignificant performance penalty using slices ...

Arrays also prevent indirects making certain types of code faster. Of course this is such a minor optimization that this is insignificant in nearly all cases.

答案2

得分: 13

为了补充Stephen Weinberg的回答:

以下是一个关于“规划内存详细布局”的实际示例。有许多文件格式。通常,文件格式如下:以“魔数”开头,然后是一个信息头,其结构通常是固定的。该头部包含有关内容的信息,例如图像文件中包含图像大小(宽度、高度)、像素格式、使用的压缩方法、头部大小、图像数据偏移等(基本上描述了文件的其余部分以及如何解释/处理它)。

如果你想在Go中实现一个文件格式,一种简单方便的方法是创建一个包含该格式的头部字段的struct。当你想读取这种格式的文件时,可以使用binary.Read()方法将整个头部struct读入一个变量中,类似地,当你想写入这种格式的文件时,可以使用binary.Write()一次性将完整的头部写入文件(或发送数据的任何地方)。

头部可能包含数十个或数百个字段,但你仍然可以通过一次方法调用来读取/写入它。

现在你可以感受到,如果你想一次性完成所有操作,头部struct的“内存布局”必须与保存在文件中的字节布局完全匹配(或应该匹配)。

数组在这里起什么作用?

许多文件格式通常很复杂,因为它们希望具有通用性,从而允许广泛的用途和功能。而且很多时候,你不想实现/处理格式支持的所有功能,因为要么你不关心(因为你只想提取一些信息),要么你不需要(因为你有保证输入只使用子集或固定格式,而不是文件格式完全支持的多种情况)。

那么,如果你有一个具有许多字段的头部规范,但你只需要其中的几个字段,该怎么办?你可以定义一个包含你需要的字段的struct,并在这些字段之间使用大小与你不关心/不需要的字段相同的数组。这将确保你仍然可以通过一个函数调用读取整个头部,而数组基本上是文件中未使用数据的占位符。如果你不使用数据,你还可以在头部struct定义中使用空白标识符作为字段名。

理论示例

举个简单的例子,让我们实现一个格式,其中魔数是“TGI”(Theoretical Go Image),头部包含以下字段:2个保留字(每个16位),1个双字图像宽度,1个双字图像高度,然后是15个“不关心”的双字,最后是8字节的图像保存时间,以纳秒为单位,从1970年1月1日UTC开始计算。

可以使用以下struct来建模(不包括魔数):

type TGIHeader struct {
	_        uint16 // 保留
	_        uint16 // 保留
	Width    uint32
	Height   uint32
	_        [15]uint32 // 15个“不关心”的双字
	SaveTime int64
}

要读取一个TGI文件并打印有用的信息:

func ShowInfo(name string) error {
	f, err := os.Open(name)
	if err != nil {
		return err
	}
	defer f.Close()

	magic := make([]byte, 3)
	if _, err = f.Read(magic); err != nil {
		return err
	}
	if !bytes.Equal(magic, []byte("TGI")) {
		return errors.New("Not a TGI file")
	}

	th := TGIHeader{}
	if err = binary.Read(f, binary.LittleEndian, &th); err != nil {
		return err
	}

	fmt.Printf("%s is a TGI file,\n\timage size: %dx%d\n\tsaved at: %v",
		name, th.Width, th.Height, time.Unix(0, th.SaveTime))

    return nil
}
英文:

To supplement Stephen Weinberg's answer:

> So, what are some real examples of "planning the detailed layout of memory" or "help avoid allocation" that slices would be unsuited for?

Here's an example for "planning the detailed layout of memory". There are many file formats. Usually a file format is like this: it starts with a "magic number" then follows an informational header whose structure is usually fixed. This header contains information about the content, for example in case of an image file it contains info like image size (width, height), pixel format, compression used, header size, image data offset and alike (basically describes the rest of the file and how to interpret / process it).

If you want to implement a file format in Go, an easy and convenient way is to create a struct containing the header fields of the format. When you want to read a file of such format, you can use the binary.Read() method to read the whole header struct into a variable, and similarly when you want to write a file of that format, you can use binary.Write() to write the complete header in one step into the file (or wherever you send the data).

The header might contain even tens or a hundred fields, you can still read/write it with just one method call.

Now as you can feel, the "memory layout" of the header struct must match exactly the byte layout as it is saved (or should be saved) in the file if you want to do it all in one step.

And where do arrays come into the picture?

Many file formats are usually complex because they want to be general and so allowing a wide range of uses and functionality. And many times you don't want to implement / handle everything the format supports because either you don't care (because you just want to extract some info), or you don't have to because you have guarantees that the input will only use a subset or a fixed format (out of the many cases the file format fully supports).

So what do you do if you have a header specification with many fields but you only need a few of them? You can define a struct which will contain the fields you need, and between the fields you can use arrays with the size of the fields you just don't care / don't need. This will ensure that you can still read the whole header with one function call, and the arrays will basically be the placeholder of the unused data in the file. You may also use the blank identifier as the field name in the header struct definition if you won't use the data.

Theoretical example

For an easy example, let's implement a format where the magic is "TGI" (Theoretical Go Image) and the header contains fields like this: 2 reserved words (16 bit each), 1 dword image width, 1 dword image height, now comes 15 "don't care" dwords then the image save time as 8-byte being nanoseconds since January 1, 1970 UTC.

This can be modeled with a struct like this (magic number excluded):

type TGIHeader struct {
	_        uint16 // Reserved
	_        uint16 // Reserved
	Width    uint32
	Height   uint32
	_        [15]uint32 // 15 "don't care" dwords
	SaveTime int64
}

To read a TGI file and print useful info:

func ShowInfo(name string) error {
	f, err := os.Open(name)
	if err != nil {
		return err
	}
	defer f.Close()

	magic := make([]byte, 3)
	if _, err = f.Read(magic); err != nil {
		return err
	}
	if !bytes.Equal(magic, []byte("TGI")) {
		return errors.New("Not a TGI file")
	}

	th := TGIHeader{}
	if err = binary.Read(f, binary.LittleEndian, &th); err != nil {
		return err
	}

	fmt.Printf("%s is a TGI file,\n\timage size: %dx%d\n\tsaved at: %v",
		name, th.Width, th.Height, time.Unix(0, th.SaveTime))

    return nil
}

答案3

得分: 6

一个实际的区别是arrays是可哈希的,而slices不是。

英文:

One practical difference is that arrays are hashable, while slices are not.

答案4

得分: 0

进一步扩展这个话题:

数组在规划内存的详细布局时非常有用,有时可以帮助避免分配,但主要是切片的构建块。

考虑到堆分配的开销,数组可能更高效。想想垃圾收集器、堆管理和碎片化等等。

例如,如果你有一个本地数组变量,像var x [8]int,在函数返回后不再使用,那么它很可能会在栈上分配。而栈分配比堆分配要便宜得多。

此外,对于嵌套结构,比如数组的数组或者数组在结构体内部,将它们分配在一个块中比分开分配更便宜。

因此,对于相对较短的固定大小序列,比如 IP 地址,可以使用数组。

英文:

To expand on this
> Arrays are useful when planning the detailed layout of memory and
> sometimes can help avoid allocation, but primarily they are a building
> block for slices.

Arrays can be more efficient when considering the overhead of heap allocation. Think about the garbage collector, heap management and fragmentation, etc.

For example if you have a local array variable like var x [8]int that is not used after the function returns, most probably it will be allocated on the stack.
And stack allocation is much cheaper than heap allocation.

Also for nested structures like arrays of arrays or arrays inside structs, it is cheaper to allocate them in one blob instead of in several pieces.

So, use arrays for relatively short sequences of fixed size, e.g. an IP address.

huangapple
  • 本文由 发表于 2015年6月7日 22:29:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/30694652.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定