Pyspark JDBC 返回带有列名的所有行

huangapple go评论104阅读模式
英文:

Pyspark JDBC return all rows with column names

问题

我正在使用Python 3.7中的Spark 3.2.1查询Spark(Hive)数据库表,代码如下:

这些表可以完全访问和操作,其他系统如DBeaver、PowerBI和SSRS也可以正常使用。甚至在R中使用类似的脚本可以正确返回数据。但是,当我尝试使用这个Python脚本时,由jdbc返回的所有行都只包含列名,而不包含数据。

以下是代码:

  1. from pyspark.sql import SparkSession
  2. spark = SparkSession.builder.getOrCreate()
  3. url = 'jdbc:hive2://1.1.1.1:10000/default;transportMode=http;httpPath=cliservice'
  4. table = 'schema.table_name'
  5. username = 'username'
  6. password = '123456'
  7. remote_table = spark.read\
  8. .format("jdbc")\
  9. .option("driver", "org.apache.hive.jdbc.HiveDriver")\
  10. .option("url", url)\
  11. .option("dbtable", table)\
  12. .option("user", username)\
  13. .option("password", password)\
  14. .load()\
  15. .limit(2)
  16. remote_table.show()
  17. spark.stop()

PySpark从我的Hive表中返回数据。

英文:

I'm querying a Spark's (Hive) database table using Spark 3.2.1 in Python 3.7 with the below code.

This tables are fully and accessible and manipulable with other system like DBeaver, PowerBI and SSRS. Even a similar script in R return the data correctly. But when i tryin use this Python script all rows returned by the jdbc contains only the column name instead the data.

This is the code:

  1. from pyspark.sql import SparkSession
  2. spark = SparkSession.builder.getOrCreate()
  3. url = 'jdbc:hive2://1.1.1.1:10000/default;transportMode=http;httpPath=cliservice'
  4. table = 'schema.table_name'
  5. username = 'username'
  6. password = '123456'
  7. remote_table = spark.read\
  8. .format("jdbc")\
  9. .option("driver", "org.apache.hive.jdbc.HiveDriver")\
  10. .option("url", url)\
  11. .option("dbtable", table)\
  12. .option("user", username)\
  13. .option("password", password)\
  14. .load()\
  15. .limit(2)
  16. remote_table.show()
  17. spark.stop()

PySpark return the data from my Hive tables.

答案1

得分: 0

这段代码解决了以下问题:

  1. # -*- coding: utf-8 -*-
  2. """
  3. Created on Fri Jun 2 07:32:51 2023
  4. 作者:yfdantas
  5. """
  6. import os
  7. import jaydebeapi
  8. def spark_connect():
  9. jdbc_url = 'jdbc:hive2://1.1.1.1:10000/default;transportMode=http;httpPath=cliservice'
  10. jdbc_driver_class = "com.cloudera.hive.jdbc.HS2Driver"
  11. jdbc_user = '用户名'
  12. jdbc_password = '123456'
  13. jdbc_jar = "E:/scripts/libs/HiveJDBC42.jar"
  14. conn = jaydebeapi.connect(
  15. jclassname=jdbc_driver_class,
  16. url=jdbc_url,
  17. driver_args=[jdbc_user, jdbc_password],
  18. jars=jdbc_jar
  19. )
  20. return conn
  21. def oracle_close(conn):
  22. conn.close()
英文:

This code resolve the problem:

  1. # -*- coding: utf-8 -*-
  2. """
  3. Created on Fri Jun 2 07:32:51 2023
  4. @author: yfdantas
  5. """
  6. import os
  7. import jaydebeapi
  8. def spark_connect():
  9. jdbc_url = 'jdbc:hive2://1.1.1.1:10000/default;transportMode=http;httpPath=cliservice'
  10. jdbc_driver_class = "com.cloudera.hive.jdbc.HS2Driver"
  11. jdbc_user = 'username'
  12. jdbc_password = '123456'
  13. jdbc_jar = "E:/scripts/libs/HiveJDBC42.jar"
  14. conn = jaydebeapi.connect(
  15. jclassname=jdbc_driver_class,
  16. url=jdbc_url,
  17. driver_args=[jdbc_user, jdbc_password],
  18. jars=jdbc_jar
  19. )
  20. return conn
  21. def oracle_close(conn):
  22. conn.close()

huangapple
  • 本文由 发表于 2023年5月11日 19:02:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76226916.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定