英文:
Proper way to release string for garbage collection after slicing
问题
根据这篇Go数据结构文章,在Strings部分中提到,对字符串进行切片操作会保留原始字符串在内存中。
(顺便说一下,在Java和其他语言中有一个众所周知的陷阱,当你切片一个字符串以保存一个小片段时,对原始字符串的引用会使整个原始字符串保留在内存中,即使只有一小部分仍然需要。Go也有这个陷阱。我们尝试并拒绝了另一种选择,即使字符串切片非常昂贵——需要分配和复制——大多数程序都会避免使用。)
因此,如果我们有一个非常长的字符串:
s := "Some very long string..."
然后我们取一个小的切片:
newS := s[5:9]
原始的s
将不会被释放,直到我们也释放了newS
。考虑到这一点,如果我们需要长期保留newS
,但释放s
进行垃圾回收,应该采取什么适当的方法呢?
我想也许可以这样做:
newS := string([]byte(s[5:9]))
但我不确定这是否有效,或者是否有更好的方法。
英文:
According to this Go Data Structures article, under the Strings section it states that taking a slice of a string will keep the original string in memory.
>"(As an aside, there is a well-known gotcha in Java and other languages that when you slice a string to save a small piece, the reference to the original keeps the entire original string in memory even though only a small amount is still needed. Go has this gotcha too. The alternative, which we tried and rejected, is to make string slicing so expensive—an allocation and a copy—that most programs avoid it.)"
So if we have a very long string:
s := "Some very long string..."
And we take a small slice:
newS := s[5:9]
The original s
will not be released until we also release newS
. Considering this, what is the proper approach to take if we need to keep newS
long term, but release s
for garbage collection?
I thought maybe this:
newS := string([]byte(s[5:9]))
But I wasn't certain if that would actually work, or if there's a better way.
答案1
得分: 5
是的,将字符串转换为字节切片会创建一个字符串的副本,所以原始字符串不再被引用,可以在某个时候被垃圾回收。
作为对此的“证明”(好吧,它证明了字节切片与原始字符串不共享相同的底层数据):
http://play.golang.org/p/pwGrlETibj
编辑:并且证明字节切片只有必要的长度和容量(换句话说,它的容量不等于原始字符串的容量):
http://play.golang.org/p/3pwZtCgtWv
编辑2:你可以清楚地看到内存分析的情况。在reuseString()中,使用的内存非常稳定。在copyString()中,它快速增长,显示了通过[]byte转换所做的字符串副本。
http://play.golang.org/p/kDRjePCkXq
英文:
Yes, converting to a slice of bytes will create a copy of the string, so the original one is not referenced anymore, and can be GCed somewhere down the line.
As a "proof" of this (well, it proves that the slice of bytes doesn't share the same underlying data as the original string):
http://play.golang.org/p/pwGrlETibj
Edit: and proof that the slice of bytes only has the necessary length and capacity (in other words, it doesn't have a capacity equal to that of the original string):
http://play.golang.org/p/3pwZtCgtWv
Edit2: And you can clearly see what happens with the memory profiling. In reuseString(), the memory used is very stable. In copyString(), it grows fast, showing the copies of the string done by the []byte conversion.
答案2
得分: 3
确保在切片字符串并保持切片“活动”后,最好的方法是创建切片的副本并保持副本“活动”。但是,现在我们以时间性能的恶化为代价来购买更好的内存性能。在某些情况下可能很好,但在其他情况下可能是邪恶的。有时候只有正确的测量,而不是猜测,才能告诉我们真正的收益在哪里。
例如,我在使用StrPack时,我更喜欢一点邪恶的东西;-)
英文:
The proper way to ensure a string might eventually get eligible for garbage collection after slicing it and keeping the slice "live", is to create a copy of the slice and keeping "live" the copy instead. But now one is buying better memory performance at the cost of worsened time performance. Might be good somewhere, but might be evil elsewhere. Sometimes only proper measurements, not guessing, will tell where the real gain is.
I'm, for example, using StrPack, when I prefer a bit of evilness
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论