2023年5月18日 06:55:59go评论230阅读模式

英文:

How to drop table using pyspark jdbc connector to teradata?

问题

I can help with the translation. Here's the translated text without the code:

我可以从Teradata数据库中选择，但无法使用pyspark删除。

我还尝试使用jaydebeapi在同一个Spark会话中删除表格，那样可以工作。希望有人遇到过相同的问题。

错误信息：

Py4JJavaError: 在调用o265.load时发生错误。: java.sql.SQLException: [Teradata Database] [TeraJDBC 17.20.00.15] [错误 3707] [SQLState 42000] 语法错误，期望类似于名称、Unicode delimited标识符、'UDFCALLNAME'关键字、'SELECT'关键字或'('之间的'('和'DROP'关键字。

英文:

I can select from the teradata database, but I cannot drop using pyspark.

I have also used jaydebeapi to drop the table in the same spark session and that works.
Was hoping someone may have encountered the same issue.

drop_sql = &quot;&quot;&quot; (DROP TABLE &lt;DB_NAME&gt;.&lt;TABLENAME&gt;) &quot;&quot;&quot;


conn = spark.read \
.format(&quot;jdbc&quot;) \
.option(&quot;driver&quot;,&quot;com.teradata.jdbc.TeraDriver&quot;) \
.option(&quot;url&quot;,&quot;jdbc:teradata://&lt;IP_ADDRESS&gt;/DATABASE=. &lt;DB_NAME&gt;,TMODE=ANSI,CHARSET=UTF8,TYPE=FASTLOAD,LOGMECH=LDAP&quot;) \
.option(&quot;query&quot;, drop_sql) \
.option(&quot;user&quot;, user) \
.option(&quot;password&quot;,password)\
.option(&quot;fetchsize&quot;,10000).load()

ERROR:

Py4JJavaError: An error occurred while calling o265.load. : java.sql.SQLException: [Teradata Database] [TeraJDBC 17.20.00.15] [Error 3707] [SQLState 42000] Syntax error, expected something like a name or a Unicode delimited identifier or an 'UDFCALLNAME' keyword or a 'SELECT' keyword or '(' between '(' and the 'DROP' keyword.

答案1

得分: 1

spark.read 提供了更高级别的语言。它不是Python的Terradata驱动程序。

传递给 spark.read.format('jdbc').option(query, '...') 的查询只能包含 SELECT 语句。
无论您在那里提供什么，在Spark代码中都会被包装在一个外部的 SELECT 中，然后发送到底层驱动程序执行。例如：


spark.read.format("jdbc") \
    .option("url", jdbcUrl) \
    .option("query", "SELECT c1, c2 FROM t1") \
    .option("partitionColumn", "partiion_id") \
    .option("lowerBound", "1") \
    .option("upperBound", "300") \
    .option("numPartitions", "3") \
    .load()

将在底层数据库上并行执行3个类似这样的查询。请注意，真正的查询可能会稍有不同，这只是为了学术目的而编写的示例：

SELECT t.* FROM (SELECT c1, c2 FROM t1 WHERE partiion_id BETWEEN 1 AND 100) t
SELECT t.* FROM (SELECT c1, c2 FROM t1 WHERE partiion_id BETWEEN 100 AND 200) t
SELECT t.* FROM (SELECT c1, c2 FROM t1 WHERE partiion_id BETWEEN 100 AND 300) t

所以在你的情况下，Terradata不高兴是因为Spark执行了类似以下内容的操作：

SELECT t.* FROM (DROP TABLE <DB_NAME>.<TABLENAME>) t

你拥有的不是 "pyspark jdbc connector to teradata"。它是 "Terradata JDBC driver"。

要在Terradata上运行Terradata特定的SQL，你需要编写使用Terradata特定驱动程序的Python代码。这里有一个示例。

import teradatasql

with teradatasql.connect (host="whomooz", user="guest", password="please") as con:
    with con.cursor () as cur:
        try:
            sRequest = "DROP TABLE <DB_NAME>.<TABLENAME>"
            print (sRequest)
            cur.execute (sRequest)
        except Exception as ex:
            print ("Ignoring", str (ex).split ("\n") [0])

如果你想在Databricks/Spark集群上运行此代码，那么你需要在相关的集群上添加jdbc驱动程序库，例如，作为集群库。然后在该集群上运行上面的代码。

我假设您已经在得到错误消息之前完成了这些操作。

英文:

spark.read provides a higher level language. It's not a Terradata driver for python.

Queries you pass to spark.read.format('jdbc').option(query, '...') can only contain SELECT statements.
Whatever you provide there is in-turn wrapped in an outer SELECT by spark code before it's sent to underlying driver for execution. E.g.


spark.read.format(&quot;jdbc&quot;) \
    .option(&quot;url&quot;, jdbcUrl) \
    .option(&quot;query&quot;, &quot;SELECT c1, c2 FROM t1&quot;) \
    .option(&quot;partitionColumn&quot;, &quot;partiion_id&quot;) \
    .option(&quot;lowerBound&quot;, &quot;1&quot;) \
    .option(&quot;upperBound&quot;, &quot;300&quot;) \
    .option(&quot;numPartitions&quot;, &quot;3&quot;) \
    .load()

Would translate to 3 queries like these being executed in parallel on underlying DB. Please note real ones would be slightly different, this is curated for academic purpose:

SELECT t.* FROM (SELECT c1, c2 FROM t1 WHERE partiion_id BETWEEN 1 AND 100) t
SELECT t.* FROM (SELECT c1, c2 FROM t1 WHERE partiion_id BETWEEN 100 AND 200) t
SELECT t.* FROM (SELECT c1, c2 FROM t1 WHERE partiion_id BETWEEN 100 AND 300) t

So in your case Terradata is unhappy because Spark is executing something along the lines of:

SELECT t.* FROM (DROP TABLE <DB_NAME>.<TABLENAME>) t

What you have is not "pyspark jdbc connector to teradata". It's "Terradata JDBC driver".

To run Terradata specific SQL on Terradata you need write python code that uses Terradata specific driver. Here is an example.

import teradatasql

with teradatasql.connect (host=&quot;whomooz&quot;, user=&quot;guest&quot;, password=&quot;please&quot;) as con:
    with con.cursor () as cur:
        try:
            sRequest = &quot;DROP TABLE &lt;DB_NAME&gt;.&lt;TABLENAME&gt;&quot;
            print (sRequest)
            cur.execute (sRequest)
        except Exception as ex:
            print (&quot;Ignoring&quot;, str (ex).split (&quot;\n&quot;) [0])

If you want to run this code on Databricks/Spark-cluster then you'll have to add the jdbc driver library on the cluster in question. E.g. as a cluster library. And then run the code above on that cluster.

I assume you've already done this given the error you're getting.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用PySpark JDBC连接器删除Teradata中的表格？

问题

答案1

每当我运行特定的代码时，我不断地收到以下错误信息：Azure Databricks

PySpark：根据列中的数字和多个条件创建新行（展开）。

无法从JupyterHub运行Spark作业。

示例的JavaFX应用程序，用于将JPEG图像读取并写入数据库。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论