英文:
update table from Pyspark using JDBC
问题
抱歉,你的代码部分不需要翻译。以下是你要翻译的文本内容:
"I have a small log
dataframe which has metadata regarding the ETL
performed within a given notebook, the notebook is part of a bigger ETL
pipeline managed in Azure DataFactory.
Unfortunately, it seems that Databricks cannot invoke stored procedures so I'm manually appending a row with the correct data to my log table.
however, I cannot figure out the correct sytnax to update a table given a set of conditions :
the statement I use to append a single row is as follows :
spark_log.write.jdbc(sql_url, 'internal.Job', mode='append')
this works swimmingly however, as my Data Factory is invoking a stored procedure,
I need to work in a query like
query = f"""
UPDATE [internal].[Job] SET
[MaxIngestionDate] date {date}
, [DataLakeMetadataRaw] varchar(MAX) NULL
, [DataLakeMetadataCurated] varchar(MAX) NULL
WHERE [IsRunning] = 1
AND [FinishDateTime] IS NULL"""
Is this possible ? if so can someone show me how?
Looking at the documentation this only seems to mention using select
statements with the query
parameter :
Target Database is an Azure SQL Database.
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
just to add this is a tiny operation, so performance is a non-issue."
英文:
I have a small log
dataframe which has metadata regarding the ETL
performed within a given notebook, the notebook is part of a bigger ETL
pipeline managed in Azure DataFactory.
Unfortunately, it seems that Databricks cannot invoke stored procedures so I'm manually appending a row with the correct data to my log table.
however, I cannot figure out the correct sytnax to update a table given a set of conditions :
the statement I use to append a single row is as follows :
spark_log.write.jdbc(sql_url, 'internal.Job',mode='append')
this works swimmingly however, as my Data Factory is invoking a stored procedure,
I need to work in a query like
query = f"""
UPDATE [internal].[Job] SET
[MaxIngestionDate] date {date}
, [DataLakeMetadataRaw] varchar(MAX) NULL
, [DataLakeMetadataCurated] varchar(MAX) NULL
WHERE [IsRunning] = 1
AND [FinishDateTime] IS NULL"""
Is this possible ? if so can someone show me how?
Looking at the documentation this only seems to mention using select
statements with the query
parameter :
Target Database is an Azure SQL Database.
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
just to add this is a tiny operation, so performance is a non-issue.
答案1
得分: 3
你无法使用jdbc在Spark中的数据框中进行单个记录的更新。你只能追加或替换整个表格。
你可以使用pyodbc来进行更新 - 需要安装MSSQL ODBC驱动程序(https://stackoverflow.com/questions/54132249/how-to-install-pyodbc-in-databricks)或者可以通过JayDeBeApi来使用jdbc(https://pypi.org/project/JayDeBeApi/)。
英文:
You can't do single record updates using jdbc in Spark with dataframes. You can only append or replace the entire table.
You can do updates using pyodbc- requires installing the MSSQL ODBC driver (https://stackoverflow.com/questions/54132249/how-to-install-pyodbc-in-databricks) or you can use jdbc via JayDeBeApi (https://pypi.org/project/JayDeBeApi/)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论