Pyspark JDBC 返回带有列名的所有行

huangapple go评论71阅读模式
英文:

Pyspark JDBC return all rows with column names

问题

我正在使用Python 3.7中的Spark 3.2.1查询Spark(Hive)数据库表,代码如下:

这些表可以完全访问和操作,其他系统如DBeaver、PowerBI和SSRS也可以正常使用。甚至在R中使用类似的脚本可以正确返回数据。但是,当我尝试使用这个Python脚本时,由jdbc返回的所有行都只包含列名,而不包含数据。

以下是代码:

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

url = 'jdbc:hive2://1.1.1.1:10000/default;transportMode=http;httpPath=cliservice'
table = 'schema.table_name'
username = 'username'
password = '123456'

remote_table = spark.read\
                    .format("jdbc")\
                    .option("driver", "org.apache.hive.jdbc.HiveDriver")\
                    .option("url", url)\
                    .option("dbtable", table)\
                    .option("user", username)\
                    .option("password", password)\
                    .load()\
                    .limit(2)

remote_table.show()
spark.stop()

PySpark从我的Hive表中返回数据。

英文:

I'm querying a Spark's (Hive) database table using Spark 3.2.1 in Python 3.7 with the below code.

This tables are fully and accessible and manipulable with other system like DBeaver, PowerBI and SSRS. Even a similar script in R return the data correctly. But when i tryin use this Python script all rows returned by the jdbc contains only the column name instead the data.

This is the code:

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

url = 'jdbc:hive2://1.1.1.1:10000/default;transportMode=http;httpPath=cliservice'
table = 'schema.table_name'
username = 'username'
password = '123456'

remote_table = spark.read\
                    .format("jdbc")\
                    .option("driver", "org.apache.hive.jdbc.HiveDriver")\
                    .option("url", url)\
                    .option("dbtable", table)\
                    .option("user", username)\
                    .option("password", password)\
                    .load()\
                    .limit(2)

remote_table.show()
spark.stop()

PySpark return the data from my Hive tables.

答案1

得分: 0

这段代码解决了以下问题:

# -*- coding: utf-8 -*-
"""
Created on Fri Jun  2 07:32:51 2023

作者:yfdantas
"""
import os
import jaydebeapi

def spark_connect():

    jdbc_url = 'jdbc:hive2://1.1.1.1:10000/default;transportMode=http;httpPath=cliservice'
    jdbc_driver_class = "com.cloudera.hive.jdbc.HS2Driver"
    jdbc_user = '用户名'
    jdbc_password = '123456'
    jdbc_jar = "E:/scripts/libs/HiveJDBC42.jar"

    conn = jaydebeapi.connect(
        jclassname=jdbc_driver_class,
        url=jdbc_url,
        driver_args=[jdbc_user, jdbc_password],
        jars=jdbc_jar
    )

    return conn

def oracle_close(conn):
    conn.close()
英文:

This code resolve the problem:

# -*- coding: utf-8 -*-
"""
Created on Fri Jun  2 07:32:51 2023

@author: yfdantas
"""


import os
import jaydebeapi

def spark_connect():

    jdbc_url = 'jdbc:hive2://1.1.1.1:10000/default;transportMode=http;httpPath=cliservice'
    jdbc_driver_class = "com.cloudera.hive.jdbc.HS2Driver"
    jdbc_user = 'username'
    jdbc_password = '123456'
    jdbc_jar = "E:/scripts/libs/HiveJDBC42.jar"

    conn = jaydebeapi.connect(
        jclassname=jdbc_driver_class,
        url=jdbc_url,
        driver_args=[jdbc_user, jdbc_password],
        jars=jdbc_jar
    )
    
    return conn

def oracle_close(conn):
    conn.close()

huangapple
  • 本文由 发表于 2023年5月11日 19:02:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76226916.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定