Synapse Spark写入不同的/mount点[或]容器

huangapple go评论61阅读模式
英文:

Synapse Spark write to different /mount point [or] container

问题

I understand your request. Here's the translated code part:

我试图从 Synapse Spark 将 DataFrame 以 .delta 格式写入容器

以下是示例代码

sqlstmt = 'select * from tableblah limit 5'
df = spark.sql(sqlstmt)
df.write.format("delta").mode("overwrite").save("/bronze/account")

这是在将数据写入 Synapse 的 Primay ADLS Gen2 文件系统例如容器名称 'synapsews'中的 synapsews 容器 /bronze/account

如何挂载不同的容器并将数据写入其中

我尝试挂载 'devbronze' 容器如下

mssparkutils.fs.mount(
    "abfss://devbronze@dlsdataxxxxdirdev001.dfs.core.windows.net",
    "/devbronze",
    { "linkedService":"LS_synapsews" }
)

现在当尝试将数据写入此挂载点 'devbronze'

df.write.format("delta").mode("overwrite").save("/devbronze/account")

它仍然会将数据写入 Primay ADLS Gen2 文件系统即 synapsews创建 /devbronze/account 文件夹

如何更改以将数据写入挂载点 'devbronze'

谢谢
英文:

Am trying to write DataFrame from Synapse spark as .delta into a container.

Here is the example code:

sqlstmt = 'select * from tableblah limit 5'
df = spark.sql(sqlstmt)
df.write.format("delta").mode("overwrite").save("/bronze/account")

What this is doing is it's into Synapse's Primay ADLS Gen2 file system (ex: container name 'synapsews', now file written is in synapsews container /bronze/account.

How can i mount different container and write into it.

I tried mounting 'devbronze' container as below:

mssparkutils.fs.mount(
"abfss://devbronze@dlsdataxxxxdirdev001.dfs.core.windows.net",
"/devbronze",
{ "linkedService":"LS_synapsews" }

)

now when tried to write this mount as below:

df.write.format("delta").mode("overwrite").save("/devbronze/account")

it still write to Primay ADLS Gen2 file system which is synapsews by creating /devbronze/account folders.

How to change to write to mounted point 'devbronze'

Thanks

答案1

得分: 1

I tried the same way as yours in my environment and ended up with same result. Here synapsedata is my primary file system for the synapse workspace.

Synapse Spark写入不同的/mount点[或]容器

To store the files in the mount point other than the primary file system, we can use the path like synfs:/<jobid>/<mountpoint>/<path>.

Use the below code which is working for me to achieve your requirement.

path='synfs:/'+jobid+'/devbronze/account'
print(path)

df1.write.format("delta").mode("overwrite").save(path)

First get the job id from mssparkutils and build the path with your mount location. Here devbronze is my mount point and container name which was created by using linked service.

Synapse Spark写入不同的/mount点[或]容器

Code Execution:

Synapse Spark写入不同的/mount点[或]容器

Result files in mount path:

Synapse Spark写入不同的/mount点[或]容器

英文:

I tried the same way as yours in my environment and ended up with same result. Here synapsedata is my primary file system for the synapse workspace.

Synapse Spark写入不同的/mount点[或]容器

To store the files in the mount point other than the primary file system, we can use the path like synfs:/<jobid>/<mountpoint>/<path>.

Use the below code which is working for me to achieve your requirement.

jobid=mssparkutils.env.getJobId()
path='synfs:/'+jobid+'/devbronze/account'
print(path)

df1.write.format("delta").mode("overwrite").save(path)

First get the job id from mssparkutils and build the path with your mount location. Here devbronze is my mount point and container name which was created by using linked service.

Synapse Spark写入不同的/mount点[或]容器

Code Execution:

Synapse Spark写入不同的/mount点[或]容器

Result files in mount path:

Synapse Spark写入不同的/mount点[或]容器

huangapple
  • 本文由 发表于 2023年5月15日 11:46:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76250774.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定