英文:
Refactoring PySpark to Snowflake Snowpark code
问题
import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col
def checkHiveTableExists(tableName, stageName="base"):
try:
session.execute(f"DESCRIBE TABLE {stageName}.{tableName}")
return True
except Exception as e:
print(f"An error was thrown, <<{e}>>")
return False
英文:
Most of code is written in PySpark to be executed on Databricks.
I am evaluating SnowFlake with it's ability to execute Python with Snowpark.
Can someone let me know how I might go about refactoring the following PySpark function to Snowpark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
def checkHiveTableExists(tableName,stageName="base"):
try:
spark.sql(f"describe {stageName}{tableName}")
return True
except Exception as e:
print(f"An error was thrown, <<{e}>>")
return False
My attempt is a follows:
import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col
from snowflake.snowpark import session
def checkHiveTableExists(tableName,stageName="base"):
try:
spark.sql(f"describe {stageName}{tableName}")
return True
except Exception as e:
print(f"An error was thrown, <<{e}>>")
return False
def main(session: snowpark.Session):
return checkHiveTableExists(False)
But it failed.
Any thoughts?
答案1
得分: 1
Here's the translated code:
import snowflake.snowpark as snowpark
def checkTableExists(session: snowpark.Session, tableName, schemaName="BASE"):
try:
session.sql(f"DESC TABLE IDENTIFIER('{schemaName}.{tableName}')").collect()
return True
except Exception as e:
#print(f"An error was thrown, <<{e}>>")
return False
def main(session: snowpark.Session):
return checkTableExists(session, 'TEST')
Please note that I've removed the HTML entities like "
and '
for better readability in the translated code.
英文:
Original code:
def checkHiveTableExists(tableName,stageName="base"):
try:
spark.sql(f"describe {stageName}{tableName}")
return True
except Exception as e:
print(f"An error was thrown, <<{e}>>")
return False
Copy-paste approach into Snowflake will not work.
- There is no
spark
module - the direct SQL is also different
DESC ...
=>DESC TABLE ...
The original code itself is somehow tricky, because for object existence is uses query to describe the object
Second: describe {stageName}{tableName}
string interpolation makes it prone to SQL Injection - somebody could try to call it as checkHivetableExists("someName'; DROP TABLE ...")
When translating code the focus should be on behavior.
Personally I would perform a metadata check INFORMATION_SCHEMA.TABLES and see if row exists - user calling this code needs to have permission to access this table.
Anyway, trying to be as as close as possible to original code:
import snowflake.snowpark as snowpark
def checkTableExists(session: snowpark.Session, tableName, schemaName="BASE"):
try:
session.sql(f"DESC TABLE IDENTIFIER('{schemaName}.{tableName}')").collect()
return True
except Exception as e:
#print(f"An error was thrown, <<{e}>>")
return False
def main(session: snowpark.Session):
return checkTableExists(session, 'TEST')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论