我可以去重记录以节省内存吗?

huangapple go评论63阅读模式
英文:

Can I deduplicate records to save memory?

问题

我有很多存储相等值的Java记录。我可以去重它们以节省内存吗?我的记录是完全不可变的。因为我从流式数据源(JSON流或CSV流)中读取并构造它们,所以我认为它们在内存中将是不同的对象。但大多数记录包含相同的值。

例如使用HashMap?我看到了JVM内部的字符串池,但它仅适用于字符串,而且它们的生命周期太长。

英文:

I have lots of java records that store the equality value. May I deduplicate them to save memory? My records is fully immutable.
Because I read and construct them from a stream style data source(A JSON stream or csv stream), so I think they will be distinct object in memory. But most of them contains the same value.

Such as use a HashMap? I see the JVM internal string pool, but it's only for String and they're lived too long.

答案1

得分: 1

如果您需要为每个记录保留计数,您可以使用 Map<MyRecord, Integer>

List<MyRecord> list = ...;
Map<MyRecord, Integer> map = new HashMap<>();
for (MyRecord myRecord : list) {
  map.compute(myRecord, (key, value) -> value == null ? 1 : value + 1);
}

这种方法使用了 Map.compute() 来计算值。

如果您只需要去掉重复项,将它们添加到 Set<MyRecord> 即可:

List<MyRecord> list = ...;
Set<MyRecord> set = new HashSet<>(list);
英文:

If you need to keep the count for each record, you can use Map&lt;MyRecord, Integer&gt;:

List&lt;MyRecord&gt; list = ...;
Map&lt;MyRecord, Integer&gt; map = new HashMap&lt;&gt;();
for (MyRecord myRecord : list) {
  map.compute(myRecord, (key, value) -&gt; value == null ? 1 : value + 1);
}

This approach uses Map.compute() to calculate the values.

If you only need to get rid any duplicates, adding to a Set&lt;MyRecord&gt; is enough:

List&lt;MyRecord&gt; list = ...;
Set&lt;MyRecord&gt; set = new HashSet&lt;&gt;(list);

huangapple
  • 本文由 发表于 2023年6月16日 02:15:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76484461.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定