有哪些其他方式可以在Snowflake中的JavaScript存储过程中格式化SQL?

huangapple go评论71阅读模式
英文:

What are other ways of formatting the SQL in a JavaScript stored procedure in Snowflake?

问题

我在Snowflake中创建了一个存储过程,允许我识别给定表中的重复数据行并停止后续代码的运行。

此存储过程的参数包括数据库、模式、表以及自然键。因为在一个表中查找重复行可能需要多个自然键在ROW_NUMBER()函数中使用。

CREATE OR REPLACE PROCEDURE FIND_DUPLICATE_ROWS(DB_NAME STRING, SCHEMA_NAME STRING, TABLE_NAME STRING, NATURAL_KEYS STRING)
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS
$$
    // 将自然键拆分为数组并去除空格
    var keysArray = NATURAL_KEYS.split(',').map(function(item) { return item.trim(); });

    // 将自然键格式化为逗号分隔的字符串以供查询使用
    var NaturalkeyString = keysArray.join(', ');

    // 组装SQL查询以识别重复行
    var sql_query = 
        'WITH duplicates AS (' +
            'SELECT ' + NaturalkeyString +
            ', ROW_NUMBER() OVER (PARTITION BY ' + NaturalkeyString + ' ORDER BY ' + NaturalkeyString + ') AS RowNumber ' +
            'FROM "' + DB_NAME + '"."' + SCHEMA_NAME + '"."' + TABLE_NAME + '"' +
        ') ' +
        'SELECT * FROM duplicates WHERE RowNumber > 1';

    var stmt = snowflake.createStatement({sqlText: sql_query});

    var result = stmt.execute();

    // 检查是否存在重复行
    if(result.next()){
        throw 'ERROR: 在表 ' + TABLE_NAME + ' 中找到重复行';
    }
    else{
        return '在表 ' + TABLE_NAME + ' 中未找到重复行';
    }
$$;

-- 测试存储过程:
CALL FIND_DUPLICATE_ROWS('SUPA_DB', 'RAW', 'DIM_EMP_DETAILS', 'ID, EMP_NAME, ORG');

-- 如果存储过程 "FIND_DUPLICATE_ROWS" 正常工作,调用 "FIND_DUPLICATE_ROWS" 存储过程后,以下代码不应执行:
SELECT 7887;

我的问题是,我不喜欢如何进行SQL拼接以使其工作,是否有更清晰的方法可以做到这一点?

英文:

I created a stored procedure in Snowflake that lets me identify duplicate rows of data in a given table and stop the subsequent code from running.

The parameters for this stored procedure are the Database, Schema, Table, and one or more Natural keys from the respective table since finding duplicate rows in one table may require more than 1 natural key in the ROW_NUMBER() function.

CREATE OR REPLACE PROCEDURE FIND_DUPLICATE_ROWS(DB_NAME STRING, SCHEMA_NAME STRING, TABLE_NAME STRING, NATURAL_KEYS STRING)
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS
$$
    // Split the Natural keys into an array and trim whitespaces
    var keysArray = NATURAL_KEYS.split(',').map(function(item) { return item.trim(); });

    // Format the Natural keys into comma separated string for the query
    var NaturalkeyString = keysArray.join(', ');

    // Assemble the SQL query to identify duplicate rows
    var sql_query = 
        'WITH duplicates AS (' +
            'SELECT ' + NaturalkeyString +
            ', ROW_NUMBER() OVER (PARTITION BY ' + NaturalkeyString + ' ORDER BY ' + NaturalkeyString + ') AS RowNumber ' +
            'FROM "' + DB_NAME + '"."' + SCHEMA_NAME + '"."' + TABLE_NAME + '"' +
        ') ' +
        'SELECT * FROM duplicates WHERE RowNumber > 1';

    var stmt = snowflake.createStatement({sqlText: sql_query});

    var result = stmt.execute();

    // Check if there are duplicate rows
    if(result.next()){
        throw 'ERROR: Duplicate rows found in the table ' + TABLE_NAME;
    }
    else{
        return 'No duplicate rows found in the table ' + TABLE_NAME;
    }
$$;

-- Test the Stored Procedure:
CALL FIND_DUPLICATE_ROWS('SUPA_DB', 'RAW', 'DIM_EMP_DETAILS', 'ID,EMP_NAME,ORG');

--If the "FIND_DUPLICATE_ROWS" stored procedure works the following code should NOT execute after calling "FIND_DUPLICATE_ROWS" stored procedure:
SELECT 7887;

My issue is that I don't like how I have to concatenate the SQL to make it work, is there a cleaner way to do this?

答案1

得分: 0

I found my mistake, instead of backticks (`) I was using single quotes ('), it was just hard for me to see. Then by using JavaScript Template Literals with the syntax:

`String Text ${Expression} string text`

I was able to format my stored procedure appropriately as shown below:

CREATE OR REPLACE PROCEDURE FIND_DUPLICATE_ROWS(
DB_NAME STRING, 
SCHEMA_NAME STRING, 
TABLE_NAME STRING, 
NATURAL_KEYS STRING
)
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS
$$
    // Split the Natural keys into an array and trim whitespaces
    var keysArray = NATURAL_KEYS.split(',').map(function(item) { return item.trim(); });

    // Format the Natural keys into a comma-separated string for the query
    var NaturalKeyString = keysArray.join(', ');

    // Assemble the SQL query to identify duplicate rows
    var sql_query = `
        SELECT 
            ${NaturalKeyString},
            ROW_NUMBER() OVER(PARTITION BY ${NaturalKeyString} ORDER BY ${NaturalKeyString}) AS RowNumber
        FROM ${DB_NAME}.${SCHEMA_NAME}.${TABLE_NAME}
        QUALIFY RowNumber > 1;`;

    var stmt = snowflake.createStatement({sqlText: sql_query});

    var result = stmt.execute();

    error_message = `"ERROR: Duplicate rows found in ${DB_NAME}.${SCHEMA_NAME}.${TABLE_NAME}"`;

    // Check if there are duplicate rows
    if(result.next()){
        throw error_message;
    }
    else{
        return 0;
    }
$$;

-- SUPA_DB.RAW.DIM_EMP_DETAILS
CALL FIND_DUPLICATE_ROWS('SUPA_DB', 'RAW', 'DIM_EMP_DETAILS', 'ID,EMP_NAME,ORG');

-- If the "FIND_DUPLICATE_ROWS" stored procedure works the following code should NOT execute after calling "FIND_DUPLICATE_ROWS" stored procedure:
SELECT 7887;
英文:

I found my mistake, instead of backticks ( ` ) I was using single quotes ( ' ) it was just hard for me to see. Then by using JavaScript Template Literals with the syntax:

`String Text ${Expression} string text`

I was able to format my stored procedure appropriately as shown below:

CREATE OR REPLACE PROCEDURE FIND_DUPLICATE_ROWS(
DB_NAME STRING, 
SCHEMA_NAME STRING, 
TABLE_NAME STRING, 
NATURAL_KEYS STRING
)
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS
$$
// Split the Natural keys into an array and trim whitespaces
var keysArray = NATURAL_KEYS.split(',').map(function(item) { return item.trim(); });
// Format the Natural keys into comma separated string for the query
var NaturalKeyString = keysArray.join(', ');
// Assemble the SQL query to identify duplicate rows
var sql_query = `
SELECT 
${NaturalKeyString},
ROW_NUMBER() OVER(PARTITION BY ${NaturalKeyString} ORDER BY ${NaturalKeyString}) AS RowNumber
FROM ${DB_NAME}.${SCHEMA_NAME}.${TABLE_NAME}
QUALIFY RowNumber > 1;`;
var stmt = snowflake.createStatement({sqlText: sql_query});
var result = stmt.execute();
error_message = `"ERROR: Duplicate rows found in ${DB_NAME}.${SCHEMA_NAME}.${TABLE_NAME}"`;
// Check if there are duplicate rows
if(result.next()){
throw error_message;
}
else{
return 0;
}
$$;
-- SUPA_DB.RAW.DIM_EMP_DETAILS
CALL FIND_DUPLICATE_ROWS('SUPA_DB', 'RAW', 'DIM_EMP_DETAILS', 'ID,EMP_NAME,ORG');
--If the "FIND_DUPLICATE_ROWS" stored procedure works the following code should NOT execute after calling "FIND_DUPLICATE_ROWS" stored procedure:
SELECT 7887;

huangapple
  • 本文由 发表于 2023年6月22日 12:27:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76528601.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定