2020年1月6日 20:42:38go评论173阅读模式

英文:

update table from Pyspark using JDBC

问题

抱歉，你的代码部分不需要翻译。以下是你要翻译的文本内容：

"I have a small log dataframe which has metadata regarding the ETL performed within a given notebook, the notebook is part of a bigger ETL pipeline managed in Azure DataFactory.

Unfortunately, it seems that Databricks cannot invoke stored procedures so I'm manually appending a row with the correct data to my log table.

however, I cannot figure out the correct sytnax to update a table given a set of conditions :

the statement I use to append a single row is as follows :

spark_log.write.jdbc(sql_url, 'internal.Job', mode='append')

this works swimmingly however, as my Data Factory is invoking a stored procedure,

I need to work in a query like

query = f"""
UPDATE [internal].[Job] SET
[MaxIngestionDate] date {date}
, [DataLakeMetadataRaw] varchar(MAX) NULL
, [DataLakeMetadataCurated] varchar(MAX) NULL
WHERE [IsRunning] = 1
AND [FinishDateTime] IS NULL"""
Is this possible ? if so can someone show me how?

Looking at the documentation this only seems to mention using select statements with the query parameter :

Target Database is an Azure SQL Database.

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

just to add this is a tiny operation, so performance is a non-issue."

英文:

I have a small log dataframe which has metadata regarding the ETL performed within a given notebook, the notebook is part of a bigger ETL pipeline managed in Azure DataFactory.

Unfortunately, it seems that Databricks cannot invoke stored procedures so I'm manually appending a row with the correct data to my log table.

however, I cannot figure out the correct sytnax to update a table given a set of conditions :

the statement I use to append a single row is as follows :

spark_log.write.jdbc(sql_url, &#39;internal.Job&#39;,mode=&#39;append&#39;)

this works swimmingly however, as my Data Factory is invoking a stored procedure,

I need to work in a query like

query  = f&quot;&quot;&quot;
UPDATE [internal].[Job] SET
     [MaxIngestionDate]                date                   {date}
,    [DataLakeMetadataRaw]            varchar(MAX)            NULL
,    [DataLakeMetadataCurated]        varchar(MAX)            NULL
WHERE [IsRunning] = 1
AND [FinishDateTime] IS NULL&quot;&quot;&quot;

Is this possible ? if so can someone show me how?

Looking at the documentation this only seems to mention using select statements with the query parameter :

Target Database is an Azure SQL Database.

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

just to add this is a tiny operation, so performance is a non-issue.

答案1

得分: 3

你无法使用jdbc在Spark中的数据框中进行单个记录的更新。你只能追加或替换整个表格。

你可以使用pyodbc来进行更新 - 需要安装MSSQL ODBC驱动程序（https://stackoverflow.com/questions/54132249/how-to-install-pyodbc-in-databricks）或者可以通过JayDeBeApi来使用jdbc（https://pypi.org/project/JayDeBeApi/）。

英文:

You can't do single record updates using jdbc in Spark with dataframes. You can only append or replace the entire table.

You can do updates using pyodbc- requires installing the MSSQL ODBC driver (https://stackoverflow.com/questions/54132249/how-to-install-pyodbc-in-databricks) or you can use jdbc via JayDeBeApi (https://pypi.org/project/JayDeBeApi/)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从Pyspark使用JDBC更新表格

问题

答案1

如何在SQL中编写查询，只获取表中特定的组合？

2个虚拟网络 vs 1个虚拟网络，有2个子网？

应用在“应用调用API”场景中两次请求同意。

如何在Node.js中发布到主题时传递partitionKey或MessageId？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论