英文:
Refactoring PySpark to Snowflake Snowpark code
问题
import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col
def checkHiveTableExists(tableName, stageName="base"):
    try:
        session.execute(f"DESCRIBE TABLE {stageName}.{tableName}")
        return True
    except Exception as e:
        print(f"An error was thrown, <<{e}>>")
        return False
英文:
Most of code is written in PySpark to be executed on Databricks.
I am evaluating SnowFlake with it's ability to execute Python with Snowpark.
Can someone let me know how I might go about refactoring the following PySpark function to Snowpark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
def checkHiveTableExists(tableName,stageName="base"): 
  
  try:
    spark.sql(f"describe {stageName}{tableName}")
    return True
  except Exception as e:
    print(f"An error was thrown, <<{e}>>")
    return False
My attempt is a follows:
import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col
from snowflake.snowpark import session
def checkHiveTableExists(tableName,stageName="base"): 
 
  try:
    spark.sql(f"describe {stageName}{tableName}")
    return True
  except Exception as e:
    print(f"An error was thrown, <<{e}>>")
    return False
  
def main(session: snowpark.Session):
    return checkHiveTableExists(False)
But it failed.
Any thoughts?
答案1
得分: 1
Here's the translated code:
import snowflake.snowpark as snowpark
def checkTableExists(session: snowpark.Session, tableName, schemaName="BASE"): 
  try:
    session.sql(f"DESC TABLE IDENTIFIER('{schemaName}.{tableName}')").collect()
    return True
  except Exception as e:
    #print(f"An error was thrown, <<{e}>>")
    return False
  
def main(session: snowpark.Session):
    return checkTableExists(session, 'TEST')
Please note that I've removed the HTML entities like " and ' for better readability in the translated code.
英文:
Original code:
def checkHiveTableExists(tableName,stageName="base"): 
  
  try:
    spark.sql(f"describe {stageName}{tableName}")
    return True
  except Exception as e:
    print(f"An error was thrown, <<{e}>>")
    return False
Copy-paste approach into Snowflake will not work.
- There is no 
sparkmodule - the direct SQL is also different 
DESC ...=>DESC TABLE ... 
The original code itself is somehow tricky, because for object existence is uses query to describe the object
Second: describe {stageName}{tableName} string interpolation makes it prone to SQL Injection - somebody could try to call it as checkHivetableExists("someName'; DROP TABLE ...")
When translating code the focus should be on behavior.
Personally I would perform a metadata check INFORMATION_SCHEMA.TABLES and see if row exists - user calling this code needs to have permission to access this table.
Anyway, trying to be as as close as possible to original code:
import snowflake.snowpark as snowpark
def checkTableExists(session: snowpark.Session, tableName, schemaName="BASE"): 
  try:
    session.sql(f"DESC TABLE IDENTIFIER('{schemaName}.{tableName}')").collect()
    return True
  except Exception as e:
    #print(f"An error was thrown, <<{e}>>")
    return False
  
def main(session: snowpark.Session):
    return checkTableExists(session, 'TEST')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论