英文:
Spark: Memory not released after unpersist
问题
非常简单,我在一个由17个节点组成的集群上使用Spark 2.4.3,我有一个需要持久化的数据集(Dataset)。在一些计算/操作之后,在最后我使用了unpersist()方法,但是根据Spark UI中的存储选项卡显示,数据集仍然驻留在内存中。即使我使用unpersist(true)方法,数据集最终仍然存在。为什么会出现这种情况?
英文:
Very simply I use spark 2.4.3 on a 17 node cluster and I have a Dataset which I persist. At the end, after some calculations/actions, I use unpersist() but the Dataset remains in memory at the end according to the storage tab in Spark UI. Even if I use unpersist(true) the Dataset is still there in the end. Why is this happening?
答案1
得分: 0
已修复!最终问题出在代码中。我将名称为df的数据集持久化,然后删除列或更改列名称,并在同一名称(df)上重新赋值。我想这意味着最后在取消持久化时,只有新的数据集被取消持久化(尽管它从未在第一次持久化)。我只是在“删除列等操作”之后持久化了数据集,问题就解决了。
英文:
Fixed it! Eventually the problem was in the code. I was persisting the Dataset which name was df and then droping columns or changing names of columns and re-assigned it on the same name(df). I guess that means that at the end when I was unpersisting, only the new Dataset was unpersisted(although it was never persisted on the first place). I just persisted the Dataset after the "dropping columns etc" and problem solved.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论