英文:
Pyspark JDBC return all rows with column names
问题
我正在使用Python 3.7中的Spark 3.2.1查询Spark(Hive)数据库表,代码如下:
这些表可以完全访问和操作,其他系统如DBeaver、PowerBI和SSRS也可以正常使用。甚至在R中使用类似的脚本可以正确返回数据。但是,当我尝试使用这个Python脚本时,由jdbc返回的所有行都只包含列名,而不包含数据。
以下是代码:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
url = 'jdbc:hive2://1.1.1.1:10000/default;transportMode=http;httpPath=cliservice'
table = 'schema.table_name'
username = 'username'
password = '123456'
remote_table = spark.read\
.format("jdbc")\
.option("driver", "org.apache.hive.jdbc.HiveDriver")\
.option("url", url)\
.option("dbtable", table)\
.option("user", username)\
.option("password", password)\
.load()\
.limit(2)
remote_table.show()
spark.stop()
PySpark从我的Hive表中返回数据。
英文:
I'm querying a Spark's (Hive) database table using Spark 3.2.1 in Python 3.7 with the below code.
This tables are fully and accessible and manipulable with other system like DBeaver, PowerBI and SSRS. Even a similar script in R return the data correctly. But when i tryin use this Python script all rows returned by the jdbc contains only the column name instead the data.
This is the code:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
url = 'jdbc:hive2://1.1.1.1:10000/default;transportMode=http;httpPath=cliservice'
table = 'schema.table_name'
username = 'username'
password = '123456'
remote_table = spark.read\
.format("jdbc")\
.option("driver", "org.apache.hive.jdbc.HiveDriver")\
.option("url", url)\
.option("dbtable", table)\
.option("user", username)\
.option("password", password)\
.load()\
.limit(2)
remote_table.show()
spark.stop()
PySpark return the data from my Hive tables.
答案1
得分: 0
这段代码解决了以下问题:
# -*- coding: utf-8 -*-
"""
Created on Fri Jun 2 07:32:51 2023
作者:yfdantas
"""
import os
import jaydebeapi
def spark_connect():
jdbc_url = 'jdbc:hive2://1.1.1.1:10000/default;transportMode=http;httpPath=cliservice'
jdbc_driver_class = "com.cloudera.hive.jdbc.HS2Driver"
jdbc_user = '用户名'
jdbc_password = '123456'
jdbc_jar = "E:/scripts/libs/HiveJDBC42.jar"
conn = jaydebeapi.connect(
jclassname=jdbc_driver_class,
url=jdbc_url,
driver_args=[jdbc_user, jdbc_password],
jars=jdbc_jar
)
return conn
def oracle_close(conn):
conn.close()
英文:
This code resolve the problem:
# -*- coding: utf-8 -*-
"""
Created on Fri Jun 2 07:32:51 2023
@author: yfdantas
"""
import os
import jaydebeapi
def spark_connect():
jdbc_url = 'jdbc:hive2://1.1.1.1:10000/default;transportMode=http;httpPath=cliservice'
jdbc_driver_class = "com.cloudera.hive.jdbc.HS2Driver"
jdbc_user = 'username'
jdbc_password = '123456'
jdbc_jar = "E:/scripts/libs/HiveJDBC42.jar"
conn = jaydebeapi.connect(
jclassname=jdbc_driver_class,
url=jdbc_url,
driver_args=[jdbc_user, jdbc_password],
jars=jdbc_jar
)
return conn
def oracle_close(conn):
conn.close()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论