英文:
Pass parameters to query with spark.read.format(jdbc) format
问题
我正在尝试通过spark.read.format("jdbc")
在Redshift中执行以下示例查询:
query = "select * from tableA join tableB on a.id = b.id where a.date > ? and b.date > ?"
我尝试执行此查询如下:
# 从表中读取查询参数并将其作为数据框获取
query_parameters = ("2022-01-01", "2023-01-01")
df = spark.read.format("jdbc") \
.option("url", url) \
.option("query", query) \
.option("user", "user") \
.option("password", pass) \
.option("queryParameters", ",".join(query_parameters)) \
.option("driver", driver) \
.load()
我收到以下错误:
com.amazon.redshift.util.RedshiftException: ERROR: invalid input syntax for type date: "?"
在使用pyspark中的spark.read
执行查询时,我们可以使用?
作为占位符来传递参数。但是,你需要将占位符替换为具体的参数值。在你的示例中,你应该使用query_parameters
中的日期值来替换query
中的?
。以下是修复后的代码示例:
query = "select * from tableA join tableB on a.id = b.id where a.date > ? and b.date > ?"
query_parameters = ("2022-01-01", "2023-01-01")
df = spark.read.format("jdbc") \
.option("url", url) \
.option("query", query) \
.option("user", "user") \
.option("password", pass) \
.option("queryParameters", ",".join(query_parameters)) \
.option("driver", driver) \
.load()
这将会将占位符?
替换为实际的日期值,以便正确执行查询。
英文:
I am trying to execute following sample query throug spark.read.format("jdbc") in redshift
query="select * from tableA join tableB on a.id = b.id where a.date > ? and b.date > ?"
I am trying to execute this query as follows:
> I read query_parameters from a table and I am getting a dataframe and then made it as a list by using df.collect() statement and then converting to tuple to pass it as parameters
query_parameters = ("2022-01-01","2023-01-01")
df = spark.read.format("jdbc") \
.option("url", url) \
.option("query",query) \
.option("user","user") \
.option("password", pass) \
.option("queryParameters", ",".join(query_parameters)) \
.option("driver", driver) \
.load()
I am getting following error:
com.amazon.redshift.util.RedshiftException: ERROR: invalid input syntax for type date: \"?\"\r\n\tat
How do we pass parameters to the query when we are executing through spark.read in pyspark
答案1
得分: 1
根据官方 spark-redshift 实现,似乎没有名为 queryParameters 的选项可用。不确定你在哪里找到它,但我在官方的 github 代码 中找不到它。
将参数传递给查询的唯一方法是通过 Python 字符串拼接或插值,并设置驱动程序的 query
选项,如在相应的测试套件 RedshiftReadSuite.scala 中所示。
对于上面的示例,应该可以这样做:
query_parameters = ("2022-01-01", "2023-01-01")
query = f"select * from tableA join tableB on a.id = b.id where a.date > '{query_parameters[0]}' and b.date > '{query_parameters[1]}'"
df = spark.read.format("jdbc") \
.option("url", url) \
.option("query", query) \
.option("user", "user") \
.option("password", pass) \
.option("driver", driver) \
.load()
英文:
According to official spark-redshift implementation, it seems that there is no option named queryParameters available. Not sure where you found it, but I wasn't able to find it in the official github code.
The only way to pass parameters to your query is through Python string concatenation or interpolation and setting the query
option of the driver, as shown in the corresponding test suite RedshiftReadSuite.scala.
For the above example this should work:
query_parameters = ("2022-01-01","2023-01-01")
query=f"select * from tableA join tableB on a.id = b.id where a.date > '{query_parameters[0]}'' and b.date > '{query_parameters[1]}'"
df = spark.read.format("jdbc") \
.option("url", url) \
.option("query",query) \
.option("user","user") \
.option("password", pass) \
.option("driver", driver) \
.load()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论