问题

非常简单，我在一个由17个节点组成的集群上使用Spark 2.4.3，我有一个需要持久化的数据集（Dataset）。在一些计算/操作之后，在最后我使用了unpersist()方法，但是根据Spark UI中的存储选项卡显示，数据集仍然驻留在内存中。即使我使用unpersist(true)方法，数据集最终仍然存在。为什么会出现这种情况？

英文:

Very simply I use spark 2.4.3 on a 17 node cluster and I have a Dataset which I persist. At the end, after some calculations/actions, I use unpersist() but the Dataset remains in memory at the end according to the storage tab in Spark UI. Even if I use unpersist(true) the Dataset is still there in the end. Why is this happening?

答案1

得分: 0

已修复！最终问题出在代码中。我将名称为df的数据集持久化，然后删除列或更改列名称，并在同一名称（df）上重新赋值。我想这意味着最后在取消持久化时，只有新的数据集被取消持久化（尽管它从未在第一次持久化）。我只是在“删除列等操作”之后持久化了数据集，问题就解决了。

英文:

Fixed it! Eventually the problem was in the code. I was persisting the Dataset which name was df and then droping columns or changing names of columns and re-assigned it on the same name(df). I guess that means that at the end when I was unpersisting, only the new Dataset was unpersisted(although it was never persisted on the first place). I just persisted the Dataset after the "dropping columns etc" and problem solved.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Spark: 解除持久化后内存未释放

问题

答案1

在Spark中，如何仅选择包含字母字符的行？

如何在使用Spring Boot的RESTful服务中返回String Enum的值？

Keras model.fit 返回 NoneType 对象。

预期的 DoFn 应该是具有 URN beam:dofn:javasdk:0.1 的 FunctionSpec，但是 URN 是

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论