连接Databricks到SFTP服务器使用PySpark

huangapple go评论59阅读模式
英文:

How to Connect Databricks to SFTP Server with PySpark

问题

Is it possible to connect to an SFTP Server from Databricks? I have looked at previous questions/answers and according to the a SO question here

Issue while using above code, have a glance

英文:

Is it possible to connect to an SFTP Server from Databricks? I have looked at previous questions/answers and according to the a SO question here

It would it isn't possible to connect using Spark (at least it wasn't possible over a year ago according to @AlexOtt)

Is this still the case?


Issue while using above code, have a glance

答案1

得分: 1

首先,在您的Databricks中安装paramiko包并按照以下步骤操作。

运行以下代码以连接到SFTP服务器。

import paramiko

host = "test.rebex.net"
port = 22
username = "demo"
password = "password"

client = paramiko.SSHClient()
client.load_system_host_keys()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(host, port=port, username=username, password=password)
sftp = client.open_sftp()

然后,使用get函数通过指定路径获取所需的文件。

local_path = "/dbfs/FileStore/tables/rd.txt"
remote_path = "/pub/example/readme.txt"
sftp.get(remote_path, local_path)
spark.read.text("/FileStore/tables/rd.txt").show()

确保像上面那样提及本地路径,不要像这样使用
dbfs:/FileStore/tables/rd.txt

输出:

连接Databricks到SFTP服务器使用PySpark

然后关闭连接。

sftp.close()
client.close()
英文:

First, install paramiko package in your databricks and follow below steps.

Run below code for connecting to sftp server.

import paramiko

host = "test.rebex.net"
port = 22
username = "demo"
password = "password"

client = paramiko.SSHClient()
client.load_system_host_keys()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(host, port=port, username=username, password=password)
sftp = client.open_sftp()

Then using get function you can the files you want by specifying the path as below.

local_path = "/dbfs/FileStore/tables/rd.txt"
remote_path = "/pub/example/readme.txt"
sftp.get(remote_path, local_path)
spark.read.text("/FileStore/tables/rd.txt").show()

Make sure you mention local path as above, don't use like this
dbfs:/FileStore/tables/rd.txt
Output:

连接Databricks到SFTP服务器使用PySpark

Then close the connection.

sftp.close()
client.close()

huangapple
  • 本文由 发表于 2023年5月25日 17:46:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76330945.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定