“Glue自定义可视脚本无限运行”

huangapple go评论61阅读模式
英文:

Glue Custom Visual Script Running indefinitely

问题

I am trying to create a Custom Visual Transform but unfortunately facing some issues.

Here my motive is to Truncate a MySQL Table before loading the data into it and I want to do it with the help of Visual Transforms not by changing the auto-generated script.

My job is running continuously with the same log:

23/05/14 04:25:00 INFO MultipartUploadOutputStream: close closed:false s3://aws-glue-assets-849950158560-ap-south-1/sparkHistoryLogs/spark-application-1684037765713.inprogress

However, removing all the code except this code is working:

from awsglue import DynamicFrame

def truncate_mysql_table(self, database_name, table_name, connection_name):
    return self.filter(lambda row: row['age'] == '21')

DynamicFrame.truncate_mysql_table = truncate_mysql_table

This is the code I am using:

import pymysql
import boto3
import json

from awsglue import DynamicFrame

def truncate_mysql_table(self, database_name, table_name, connection_name):
    client = boto3.client('glue')
    response = client.get_connection(Name=connection_name, HidePassword=False)
    connection_props = response.get("Connection").get("ConnectionProperties")
    host_name = connection_props.get("JDBC_CONNECTION_URL").rsplit(":", 1)[0].split("//")[1]
    port = int(connection_props.get("JDBC_CONNECTION_URL").rsplit(":", 1)[1].split("/", 1)[0])
    secret_id = connection_props.get("SECRET_ID")

    client = boto3.client('secretsmanager')
    response = client.get_secret_value(SecretId=secret_id)

    secret_data = json.loads(response.get("SecretString"))
    username = secret_data.get("username")
    password = secret_data.get("password")

    con = pymysql.connect(host=host_name,
                          user=username,
                          passwd=password,
                          db=database_name,
                          port=port,
                          connect_timeout=60)

    with con.cursor() as cur:
        cur.execute(f"TRUNCATE TABLE {database_name.strip()}.{table_name.strip()}")
        con.commit()
        con.close()

    # print("Table Truncated")
    return self

DynamicFrame.truncate_mysql_table = truncate_mysql_table

My Glue Connection and MySQL RDS are in the same VPC also I am having VPC endpoints for S3 and secret manager. This shouldn't be a problem because after changing (or simplifying) the code it is giving the expected output.

英文:

I am trying to create a Custom Visual Transform but unfortunately facing some issues.

Here my motive is to Truncate a MySQL Table before loading the data into it and I want to do it with the help of Visual Tranforms not by changing the auto generated script.

My job is running continueously with the same log:

23/05/14 04:25:00 INFO MultipartUploadOutputStream: close closed:false s3://aws-glue-assets-849950158560-ap-south-1/sparkHistoryLogs/spark-application-1684037765713.inprogress

However removing all the code except this code is working:

from awsglue import DynamicFrame

def truncate_mysql_table(self, database_name, table_name, connection_name):
    return self.filter(lambda row: row['age'] == '21')

DynamicFrame.truncate_mysql_table = truncate_mysql_table

This the code I am using:

import pymysql
import boto3
import json

from awsglue import DynamicFrame

def truncate_mysql_table(self, database_name, table_name, connection_name):
    client = boto3.client('glue')
    response = client.get_connection(Name=connection_name, HidePassword=False)
    connection_props = response.get("Connection").get("ConnectionProperties")
    host_name = connection_props.get("JDBC_CONNECTION_URL").rsplit(":", 1)[0].split("//")[1]
    port = int(connection_props.get("JDBC_CONNECTION_URL").rsplit(":", 1)[1].split("/", 1)[0])
    secret_id = connection_props.get("SECRET_ID")

    client = boto3.client('secretsmanager')
    response = client.get_secret_value(SecretId=secret_id)

    secret_data = json.loads(response.get("SecretString"))
    username = secret_data.get("username")
    password = secret_data.get("password")

    con = pymysql.connect(host=host_name,
                          user=username,
                          passwd=password,
                          db=database_name,
                          port=port,
                          connect_timeout=60)

    with con.cursor() as cur:
        cur.execute(f"TRUNCATE TABLE {database_name.strip()}.{table_name.strip()}")
        con.commit()
        con.close()

    # print("Table Truncated")
    return self

DynamicFrame.truncate_mysql_table = truncate_mysql_table

My Glue Connection and MySQL RDS is in the same VPC also I am having VPC endpoints for s3 and secret manager.
This shouldn't be a problem becuase after changing (or simplifying) the code it is giving the expected output.

答案1

得分: 0

我在进行这个更改后,使得这个脚本能够正常运行:

在我的情况下,Lambda 函数RDS 实例 都存在于 VPC 的私有子网中,而我的 VPC 中没有任何 Glue 服务端点。

我的 Lambda 函数 正试图使用 boto3 库向 Glue 发送请求以获取连接信息,但它无法通信。

我为我的 VPC 添加了一个 Glue 服务端点,然后,哇哦!它就像魔法般地运行了。

最终,我可以在将数据加载到表中之前截断 MySQL 表

英文:

I got this script working after this change:

In my case both Lambda Function and RDS Instance is present in the private subnet of a VPC and my VPC don't have any Glue service endpoint.

My Lambda Function is trying to make a request using boto3 library to Glue to get the connection information, but it wasn't able to communicate.

I added a Glue service endpoint to my VPC and voila!!! it works like a charm.

Finally I can Truncate a MySQL Table before loading data into it.

huangapple
  • 本文由 发表于 2023年5月14日 12:35:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76245824.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定