Redshift SQL查询 – 两个具有不同字符的数据源:例如François与Francois

huangapple go评论75阅读模式
英文:

Redshift SQL Query - Two data sources with different character: eg François vs Francois

问题

我正在编写一个Redshift SQL查询,从两个包含类似信息的不同表中提取数据。一个表包含带重音、umlauts和ñ等字符的人名,而另一个表则没有。例如,一个表中有François,而另一个表中同一个人的名字是Francois。

我正在寻找一个SQL查询函数,可以显示所有结果,但去除重音和其他类似编码(不确定编码是否是正确的术语)。

我知道如何查找和替换特定值,但这不是我在这里寻找的内容。我想要系统地去除所有重音、umlauts、ñ等字符。我发现在SQL Server中可以使用collate,但看起来在Redshift中collate只处理大小写敏感性。

英文:

I am writing a Redshift SQL Query to pull data from two different tables with similar information. One table includes person names with accents, umlauts, n-yays while the other does not. For example one table has François and the other table for the same person has Francois.

Looking for a sql query function that would show all results without accents and other similar encoding (not sure if encoding is the right terminology here).

I know how to find and replace a specific value but that's not what I'm looking for here. Looking to systematically remove all accents, umlauts, n-yays, etc. I found that collate can be used in SQL Server but looks like collate only deals with case sensitivity in Redshift.

答案1

得分: 1

我不知道是否有专门的功能来处理这个,但可以使用 translate 函数。请参见 Translate

SELECT
    source_table.person_name
  , translated_table.name_without_accents
FROM source_table
JOIN (
    SELECT
        person_id
      , TRANSLATE(person_name,
                'áàäâãåéèëêíìïîóòöôõúùüûçÁÀÄÂÃÅÉÈËÊÍÌÏÎÓÒÖÔÕÚÙÜÛÇ',
                'aaaaaaeeeeiiiioooouuuucAAAAAAEEEEIIIIOOOOOUUUUC') AS name_without_accents
    FROM source_table
    WHERE person_name ~ 'áàäâãåéèëêíìïîóòöôõúùüûçÁÀÄÂÃÅÉÈËÊÍÌÏÎÓÒÖÔÕÚÙÜÛÇ'
) AS translated_table ON source_table.person_id = translated_table.person_id;

或者,按照以下方式创建一个视图:

CREATE VIEW all_names AS
SELECT
    person_id
  , prson_name
  , TRANSLATE(person_name,
            'áàäâãåéèëêíìïîóòöôõúùüûçÁÀÄÂÃÅÉÈËÊÍÌÏÎÓÒÖÔÕÚÙÜÛÇ',
            'aaaaaaeeeeiiiioooouuuucAAAAAAEEEEIIIIOOOOOUUUUC') AS name_without_accents
FROM source_table;
英文:

I'm unaware of any special function for this, but translate can be used. See Translate

SELECT
    source_table.person_name
  , translated_table.name_without_accents
FROM source_table
JOIN (
    SELECT
        person_id
      , TRANSLATE(person_name,
                'áàäâãåéèëêíìïîóòöôõúùüûçÁÀÄÂÃÅÉÈËÊÍÌÏÎÓÒÖÔÕÚÙÜÛÇ',
                'aaaaaaeeeeiiiioooouuuucAAAAAAEEEEIIIIOOOOOUUUUC') AS name_without_accents
    FROM source_table

          /* regex to locate the names containing accented chars */
    WHERE person_name ~ '[áàäâãåéèëêíìïîóòöôõúùüûçÁÀÄÂÃÅÉÈËÊÍÌÏÎÓÒÖÔÕÚÙÜÛÇ]'

    ) AS translated_table ON source_table.person_id = translated_table.person_id;

Or, create a view along these lines

CREATE VIEW all_names AS
SELECT
    person_id
  , prson_name
  , TRANSLATE(person_name,
            'áàäâãåéèëêíìïîóòöôõúùüûçÁÀÄÂÃÅÉÈËÊÍÌÏÎÓÒÖÔÕÚÙÜÛÇ',
            'aaaaaaeeeeiiiioooouuuucAAAAAAEEEEIIIIOOOOOUUUUC') AS name_without_accents
FROM source_table

huangapple
  • 本文由 发表于 2023年6月2日 08:09:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76386431.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定