问题

在Spark完成将数据帧写入S3后，似乎会使用getFileStatus来检查所写文件的有效性，这实际上是在幕后使用HeadObject来完成的。

如果我只被授予写入和列出对象的权限，但没有GetObject权限，是否有任何方法可以指示Databricks上的PySpark在成功写入后不进行此有效性测试？

英文:

After spark finishes writing the dataframe to S3, it seems like it checks the validity of the files it wrote with: getFileStatus that is HeadObject behind the scenes.

What if I'm only granted write and list objects permissions but not GetObject?
Is there any way instructing pyspark on databricks to not do this validity test after a successful write?

答案1

得分: 1

如果您使用S3A连接器，答案是“否”。这不仅仅是关于写入文件，还涉及创建目录和安全地提交工作。

“能够读取您刚刚写入的内容”是文件系统的预期行为，Hadoop云连接器提供了这种语义，使得代码不会出错。

英文:

If you are using the S3A connector the answer is "no". It's not just about writing the file, it is about creating directories and committing work safely.

"being able to read what you have just written" is an expected behaviour of filesystems and the Hadoop cloud connectors offer that semantics so code doesn't break.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用pyspark，我可以写入我没有GetObject权限的S3路径吗？

问题

答案1

Validation Error when deploying lambda function through git actions

Yarn每个容器只分配一个核心。在Yarn上运行Spark。

How to create custom error messages for the invalid fields in API Gateway using CDK using schema version Draft4?

如何使用Java应用程序将Parquet数据集转换为Delta。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论