英文:
Can I deduplicate records to save memory?
问题
我有很多存储相等值的Java记录。我可以去重它们以节省内存吗?我的记录是完全不可变的。因为我从流式数据源(JSON流或CSV流)中读取并构造它们,所以我认为它们在内存中将是不同的对象。但大多数记录包含相同的值。
例如使用HashMap?我看到了JVM内部的字符串池,但它仅适用于字符串,而且它们的生命周期太长。
英文:
I have lots of java records that store the equality value. May I deduplicate them to save memory? My records is fully immutable.
Because I read and construct them from a stream style data source(A JSON stream or csv stream), so I think they will be distinct object in memory. But most of them contains the same value.
Such as use a HashMap? I see the JVM internal string pool, but it's only for String and they're lived too long.
答案1
得分: 1
如果您需要为每个记录保留计数,您可以使用 Map<MyRecord, Integer>
:
List<MyRecord> list = ...;
Map<MyRecord, Integer> map = new HashMap<>();
for (MyRecord myRecord : list) {
map.compute(myRecord, (key, value) -> value == null ? 1 : value + 1);
}
这种方法使用了 Map.compute() 来计算值。
如果您只需要去掉重复项,将它们添加到 Set<MyRecord>
即可:
List<MyRecord> list = ...;
Set<MyRecord> set = new HashSet<>(list);
英文:
If you need to keep the count for each record, you can use Map<MyRecord, Integer>
:
List<MyRecord> list = ...;
Map<MyRecord, Integer> map = new HashMap<>();
for (MyRecord myRecord : list) {
map.compute(myRecord, (key, value) -> value == null ? 1 : value + 1);
}
This approach uses Map.compute() to calculate the values.
If you only need to get rid any duplicates, adding to a Set<MyRecord>
is enough:
List<MyRecord> list = ...;
Set<MyRecord> set = new HashSet<>(list);
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论