2023年7月23日 15:57:40go评论90阅读模式

英文:

Pass parameters to query with spark.read.format(jdbc) format

问题

我正在尝试通过spark.read.format("jdbc")在Redshift中执行以下示例查询：

query = "select * from tableA join tableB on a.id = b.id where a.date > ? and b.date > ?"

我尝试执行此查询如下：

# 从表中读取查询参数并将其作为数据框获取
query_parameters = ("2022-01-01", "2023-01-01")

df = spark.read.format("jdbc") \
  .option("url", url) \
  .option("query", query) \
  .option("user", "user") \
  .option("password", pass) \
  .option("queryParameters", ",".join(query_parameters)) \
  .option("driver", driver) \
  .load()

我收到以下错误：

com.amazon.redshift.util.RedshiftException: ERROR: invalid input syntax for type date: "?"

在使用pyspark中的spark.read执行查询时，我们可以使用?作为占位符来传递参数。但是，你需要将占位符替换为具体的参数值。在你的示例中，你应该使用query_parameters中的日期值来替换query中的?。以下是修复后的代码示例：

query = "select * from tableA join tableB on a.id = b.id where a.date > ? and b.date > ?"

query_parameters = ("2022-01-01", "2023-01-01")

df = spark.read.format("jdbc") \
  .option("url", url) \
  .option("query", query) \
  .option("user", "user") \
  .option("password", pass) \
  .option("queryParameters", ",".join(query_parameters)) \
  .option("driver", driver) \
  .load()

这将会将占位符?替换为实际的日期值，以便正确执行查询。

英文:

I am trying to execute following sample query throug spark.read.format("jdbc") in redshift

query=&quot;select * from tableA join tableB on a.id = b.id where a.date &gt; ? and b.date &gt; ?&quot;

I am trying to execute this query as follows:

> I read query_parameters from a table and I am getting a dataframe and then made it as a list by using df.collect() statement and then converting to tuple to pass it as parameters

query_parameters = (&quot;2022-01-01&quot;,&quot;2023-01-01&quot;)

df = spark.read.format(&quot;jdbc&quot;) \
  .option(&quot;url&quot;, url) \
  .option(&quot;query&quot;,query) \
  .option(&quot;user&quot;,&quot;user&quot;) \
  .option(&quot;password&quot;, pass) \
  .option(&quot;queryParameters&quot;, &quot;,&quot;.join(query_parameters)) \
  .option(&quot;driver&quot;, driver) \
  .load()

I am getting following error:

com.amazon.redshift.util.RedshiftException: ERROR: invalid input syntax for type date: \&quot;?\&quot;\r\n\tat

How do we pass parameters to the query when we are executing through spark.read in pyspark

答案1

得分: 1

根据官方 spark-redshift 实现，似乎没有名为 queryParameters 的选项可用。不确定你在哪里找到它，但我在官方的 github 代码中找不到它。

将参数传递给查询的唯一方法是通过 Python 字符串拼接或插值，并设置驱动程序的 query 选项，如在相应的测试套件 RedshiftReadSuite.scala 中所示。

对于上面的示例，应该可以这样做：

query_parameters = ("2022-01-01", "2023-01-01")

query = f"select * from tableA join tableB on a.id = b.id where a.date > '{query_parameters[0]}' and b.date > '{query_parameters[1]}'"

df = spark.read.format("jdbc") \
  .option("url", url) \
  .option("query", query) \
  .option("user", "user") \
  .option("password", pass) \
  .option("driver", driver) \
  .load()

英文:

According to official spark-redshift implementation, it seems that there is no option named queryParameters available. Not sure where you found it, but I wasn't able to find it in the official github code.

The only way to pass parameters to your query is through Python string concatenation or interpolation and setting the query option of the driver, as shown in the corresponding test suite RedshiftReadSuite.scala.

For the above example this should work:

query_parameters = (&quot;2022-01-01&quot;,&quot;2023-01-01&quot;)

query=f&quot;select * from tableA join tableB on a.id = b.id where a.date &gt; &#39;{query_parameters[0]}&#39;&#39; and b.date &gt; &#39;{query_parameters[1]}&#39;&quot;

df = spark.read.format(&quot;jdbc&quot;) \
  .option(&quot;url&quot;, url) \
  .option(&quot;query&quot;,query) \
  .option(&quot;user&quot;,&quot;user&quot;) \
  .option(&quot;password&quot;, pass) \
  .option(&quot;driver&quot;, driver) \
  .load()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将参数传递给使用 `spark.read.format(jdbc)` 格式的查询。

问题

答案1

Transform and filter array of structs with parent struct field name.

How can I build varidics of type interface in Go?

如何使用unpersist删除RDD

SQL错误 – 在第一个表格中基于其他两个表格更新列值

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论