Spark: 解除持久化后内存未释放

huangapple go评论66阅读模式
英文:

Spark: Memory not released after unpersist

问题

非常简单,我在一个由17个节点组成的集群上使用Spark 2.4.3,我有一个需要持久化的数据集(Dataset)。在一些计算/操作之后,在最后我使用了unpersist()方法,但是根据Spark UI中的存储选项卡显示,数据集仍然驻留在内存中。即使我使用unpersist(true)方法,数据集最终仍然存在。为什么会出现这种情况?

Spark: 解除持久化后内存未释放

英文:

Very simply I use spark 2.4.3 on a 17 node cluster and I have a Dataset which I persist. At the end, after some calculations/actions, I use unpersist() but the Dataset remains in memory at the end according to the storage tab in Spark UI. Even if I use unpersist(true) the Dataset is still there in the end. Why is this happening?

Spark: 解除持久化后内存未释放

答案1

得分: 0

已修复!最终问题出在代码中。我将名称为df的数据集持久化,然后删除列或更改列名称,并在同一名称(df)上重新赋值。我想这意味着最后在取消持久化时,只有新的数据集被取消持久化(尽管它从未在第一次持久化)。我只是在“删除列等操作”之后持久化了数据集,问题就解决了。

英文:

Fixed it! Eventually the problem was in the code. I was persisting the Dataset which name was df and then droping columns or changing names of columns and re-assigned it on the same name(df). I guess that means that at the end when I was unpersisting, only the new Dataset was unpersisted(although it was never persisted on the first place). I just persisted the Dataset after the "dropping columns etc" and problem solved.

huangapple
  • 本文由 发表于 2020年9月18日 17:16:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/63952844.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定