英文:
Synapse Spark write to different /mount point [or] container
问题
I understand your request. Here's the translated code part:
我试图从 Synapse Spark 将 DataFrame 以 .delta 格式写入容器。
以下是示例代码:
sqlstmt = 'select * from tableblah limit 5'
df = spark.sql(sqlstmt)
df.write.format("delta").mode("overwrite").save("/bronze/account")
这是在将数据写入 Synapse 的 Primay ADLS Gen2 文件系统(例如:容器名称 'synapsews')中的 synapsews 容器 /bronze/account。
如何挂载不同的容器并将数据写入其中。
我尝试挂载 'devbronze' 容器如下:
mssparkutils.fs.mount(
"abfss://devbronze@dlsdataxxxxdirdev001.dfs.core.windows.net",
"/devbronze",
{ "linkedService":"LS_synapsews" }
)
现在,当尝试将数据写入此挂载点 'devbronze' 时:
df.write.format("delta").mode("overwrite").save("/devbronze/account")
它仍然会将数据写入 Primay ADLS Gen2 文件系统,即 synapsews,创建 /devbronze/account 文件夹。
如何更改以将数据写入挂载点 'devbronze'?
谢谢
英文:
Am trying to write DataFrame from Synapse spark as .delta into a container.
Here is the example code:
sqlstmt = 'select * from tableblah limit 5'
df = spark.sql(sqlstmt)
df.write.format("delta").mode("overwrite").save("/bronze/account")
What this is doing is it's into Synapse's Primay ADLS Gen2 file system (ex: container name 'synapsews', now file written is in synapsews container /bronze/account.
How can i mount different container and write into it.
I tried mounting 'devbronze' container as below:
mssparkutils.fs.mount(
"abfss://devbronze@dlsdataxxxxdirdev001.dfs.core.windows.net",
"/devbronze",
{ "linkedService":"LS_synapsews" }
)
now when tried to write this mount as below:
df.write.format("delta").mode("overwrite").save("/devbronze/account")
it still write to Primay ADLS Gen2 file system which is synapsews by creating /devbronze/account folders.
How to change to write to mounted point 'devbronze'
Thanks
答案1
得分: 1
I tried the same way as yours in my environment and ended up with same result. Here synapsedata
is my primary file system for the synapse workspace.
To store the files in the mount point other than the primary file system, we can use the path like synfs:/<jobid>/<mountpoint>/<path>
.
Use the below code which is working for me to achieve your requirement.
path='synfs:/'+jobid+'/devbronze/account'
print(path)
df1.write.format("delta").mode("overwrite").save(path)
First get the job id from mssparkutils
and build the path with your mount location. Here devbronze
is my mount point and container name which was created by using linked service.
Code Execution:
Result files in mount path:
英文:
I tried the same way as yours in my environment and ended up with same result. Here synapsedata
is my primary file system for the synapse workspace.
To store the files in the mount point other than the primary file system, we can use the path like synfs:/<jobid>/<mountpoint>/<path>
.
Use the below code which is working for me to achieve your requirement.
jobid=mssparkutils.env.getJobId()
path='synfs:/'+jobid+'/devbronze/account'
print(path)
df1.write.format("delta").mode("overwrite").save(path)
First get the job id from mssparkutils
and build the path with your mount location. Here devbronze
is my mount point and container name which was created by using linked service.
Code Execution:
Result files in mount path:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论