英文:
Issue in Pyspark code when running Glue Script
问题
在填充 dnb_df
时出现的错误是因为查询字符串 var
包含了单引号('
或 "
)作为字符串定界符,而这可能会导致 Oracle 数据库解析错误。在这种情况下,你可以尝试使用双引号("
) 来定界查询字符串,因为 Oracle 通常使用双引号来定界对象标识符(表名、列名等)。下面是已修复的查询字符串:
var = "select DNB_RESULTS_DTL_ID, REQUEST_ID, source_id, dnb_level from (select max(request_id) over(partition by org_code) max_request_id, a.* from XXGMDMADM.MDM_DNB_RESULTS_DTL a where dnb_level = 'LVL1' and request_id in (131)) where MAX_REQUEST_ID = request_id"
使用双引号定界查询字符串应该能够解决这个问题。如果你使用单引号,Oracle 可能会将其中的内容解释为字符串,而不是标识符,从而导致语法错误。
英文:
import sys
import boto3
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
from pyspark.sql.functions import regexp_replace, col
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
#sc.setLogLevel('DEBUG')
glueContext = GlueContext(sc)
spark = glueContext.spark_session
#logger = glueContext.get_logger()
#logger.DEBUG('Hello Glue')
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
# ####connect to database
client = boto3.client('glue', region_name='XXXXXX')
response = client.get_connection(Name='XXXXXX')
connection_properties = response['Connection']['ConnectionProperties']
URL = connection_properties['JDBC_CONNECTION_URL']
url_list = URL.split("/")
host = "{}".format(url_list[-2][:-5])
new_host=host.split('@',1)[1]
port = url_list[-2][-4:]
database = "{}".format(url_list[-1])
Oracle_Username = "{}".format(connection_properties['USERNAME'])
Oracle_Password = "{}".format(connection_properties['PASSWORD'])
#print("Oracle_Username:",Oracle_Username)
#print("Oracle_Password:",Oracle_Password)
print("Host:",host)
print("New Host:",new_host)
print("Port:",port)
print("Database:",database)
Oracle_jdbc_url="jdbc:oracle:thin:@//"+new_host+":"+port+"/"+database
print("Oracle_jdbc_url:",Oracle_jdbc_url)
source_df = spark.read.format("jdbc").option("url", Oracle_jdbc_url).option("dbtable", "xxgmdmadm.vendor_data").option("user", Oracle_Username).option("password", Oracle_Password).load()
#store the value in array
qrys = source_df.select("SRC_QUERY").collect()
var='select DNB_RESULTS_DTL_ID,REQUEST_ID,source_id,dnb_level from (select max(request_id) over(partition by org_code ) max_request_id,a.* from XXGMDMADM.MDM_DNB_RESULTS_DTL a where dnb_level =''LVL1'' and request_id in (131) ) where MAX_REQUEST_ID=request_id'
dnb_df = spark.read.format("jdbc").option("url", Oracle_jdbc_url).option("query", var).option("user", Oracle_Username).option("password", Oracle_Password).load()
error I am getting while populating dnb_df
> An error occurred while calling o111.load. ORA-00911: invalid
> character
Not getting what is wrong with this part
> var='select DNB_RESULTS_DTL_ID,REQUEST_ID,source_id,dnb_level from
> (select max(request_id) over(partition by org_code )
> max_request_id,a.* from XXGMDMADM.MDM_DNB_RESULTS_DTL a where
> dnb_level =''LVL1'' and request_id in (131) ) where
> MAX_REQUEST_ID=request_id'
Even when I am running simple query like getting same error
> var="select DNB_RESULTS_DTL_ID,REQUEST_ID,source_id,dnb_level from
> XXGMDMADM.MDM_DNB_RESULTS_DTL"
Does query option doesn't work for Oracle ?
答案1
得分: 1
你可以在提出查询时使用别名,效果很好:
.option("dbtable", "(SELECT * FROM schema.tablename) tbl")
英文:
You can use alias while giving a query, works well:
.option("dbtable", "(SELECT * FROM schema.tablename) tbl")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论