使用pyspark,我可以写入我没有GetObject权限的S3路径吗?

huangapple go评论52阅读模式
英文:

using pyspark can I write to s3 path I don't have GetObject permission to?

问题

在Spark完成将数据帧写入S3后,似乎会使用getFileStatus来检查所写文件的有效性,这实际上是在幕后使用HeadObject来完成的。

如果我只被授予写入和列出对象的权限,但没有GetObject权限,是否有任何方法可以指示Databricks上的PySpark在成功写入后不进行此有效性测试?

英文:

After spark finishes writing the dataframe to S3, it seems like it checks the validity of the files it wrote with: getFileStatus that is HeadObject behind the scenes.

What if I'm only granted write and list objects permissions but not GetObject?
Is there any way instructing pyspark on databricks to not do this validity test after a successful write?

答案1

得分: 1

如果您使用S3A连接器,答案是“否”。这不仅仅是关于写入文件,还涉及创建目录和安全地提交工作。

“能够读取您刚刚写入的内容”是文件系统的预期行为,Hadoop云连接器提供了这种语义,使得代码不会出错。

英文:

If you are using the S3A connector the answer is "no". It's not just about writing the file, it is about creating directories and committing work safely.

"being able to read what you have just written" is an expected behaviour of filesystems and the Hadoop cloud connectors offer that semantics so code doesn't break.

huangapple
  • 本文由 发表于 2023年5月30日 06:25:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76360624.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定