在Go语言中是否有类似于Java的String intern函数的等价物?

huangapple go评论85阅读模式
英文:

Is there an equivalent to Java's String intern function in Go?

问题

在Go语言中是否有类似于Java的String intern函数的等价物?

我正在解析大量具有重复模式(标签)的文本输入。我希望在处理时能够节省内存,并且为每个标签存储指向单个字符串的指针,而不是为每个标签的每个出现存储多个字符串。

英文:

Is there an equivalent to Java's String intern function in Go?

I am parsing a lot of text input that has repeating patterns (tags). I would like to be memory efficient about it and store pointers to a single string for each tag, instead of multiple strings for each occurrence of a tag.

答案1

得分: 4

我所知道的没有这样的函数存在。但是,你可以很容易地使用映射来自己创建一个。字符串类型本身是一个uintptr和一个长度。因此,从另一个字符串分配的字符串只占用两个字。因此,你只需要确保没有两个具有冗余内容的字符串。

这是我所说的一个例子。

type Interner map[string]string

func NewInterner() Interner {
    return Interner(make(map[string]string))
}

func (m Interner) Intern(s string) string {
    if ret, ok := m
展开收缩
; ok { return ret } m
展开收缩
= s return s }

每当你执行以下操作时,这段代码将消除冗余的字符串:

str = interner.Intern(str)

编辑:正如jnml提到的,我的答案可能会根据给定的字符串固定内存。解决这个问题有两种方法。这两种方法都应该插入到我之前的示例中的m

展开收缩
= s之前。第一种方法是将字符串复制两次,第二种方法使用unsafe。两种方法都不理想。

双重复制:

b := []byte(s)
s = string(b)

不安全的(自行承担风险。适用于当前版本的gc编译器):

b := []byte(s)
s = *(*string)(unsafe.Pointer(&b))
英文:

No such function exists that I know of. However, you can make your own very easily using maps. The string type itself is a uintptr and a length. So, a string assigned from another string takes up only two words. Therefore, all you need to do is ensure that there are no two strings with redundant content.

Here is an example of what I mean.

type Interner map[string]string

func NewInterner() Interner {
    return Interner(make(map[string]string))
}

func (m Interner) Intern(s string) string {
    if ret, ok := m
展开收缩
; ok { return ret } m
展开收缩
= s return s }

This code will deduplicate redundant strings whenever you do the following:

str = interner.Intern(str)

EDIT: As jnml mentioned, my answer could pin memory depending on the string it is given. There are two ways to solve this problem. Both of these should be inserted before m

展开收缩
= s in my previous example. The first copies the string twice, the second uses unsafe. Neither are ideal.

Double copy:

b := []byte(s)
s = string(b)

Unsafe (use at your own risk. Works with current version of gc compiler):

b := []byte(s)
s = *(*string)(unsafe.Pointer(&b))

答案2

得分: 1

我认为例如PoolGoPool可能满足您的需求。该代码解决了Stephen的解决方案忽略的一个问题。在Go中,字符串值可以是更大字符串的一部分。有些情况下这并不重要,但有些情况下这是一个阻碍。这些链接的函数试图保持安全。

英文:

I think that for example Pool and GoPool may fulfill your needs. That code solves one thing which Stephen's solution ignores. In Go, a string value may be a slice of a bigger string. Scenarios are where it doesn't matter and scenarios are where that is a show stopper. The linked functions attempt to be on the safe side.

huangapple
  • 本文由 发表于 2012年10月23日 02:30:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/13017499.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定