2023年6月2日 08:09:23go评论161阅读模式

英文:

Redshift SQL Query - Two data sources with different character: eg François vs Francois

问题

我正在编写一个Redshift SQL查询，从两个包含类似信息的不同表中提取数据。一个表包含带重音、umlauts和ñ等字符的人名，而另一个表则没有。例如，一个表中有François，而另一个表中同一个人的名字是Francois。

我正在寻找一个SQL查询函数，可以显示所有结果，但去除重音和其他类似编码（不确定编码是否是正确的术语）。

我知道如何查找和替换特定值，但这不是我在这里寻找的内容。我想要系统地去除所有重音、umlauts、ñ等字符。我发现在SQL Server中可以使用collate，但看起来在Redshift中collate只处理大小写敏感性。

英文:

I am writing a Redshift SQL Query to pull data from two different tables with similar information. One table includes person names with accents, umlauts, n-yays while the other does not. For example one table has François and the other table for the same person has Francois.

Looking for a sql query function that would show all results without accents and other similar encoding (not sure if encoding is the right terminology here).

I know how to find and replace a specific value but that's not what I'm looking for here. Looking to systematically remove all accents, umlauts, n-yays, etc. I found that collate can be used in SQL Server but looks like collate only deals with case sensitivity in Redshift.

答案1

得分: 1

我不知道是否有专门的功能来处理这个，但可以使用 translate 函数。请参见 Translate

SELECT
    source_table.person_name
  , translated_table.name_without_accents
FROM source_table
JOIN (
    SELECT
        person_id
      , TRANSLATE(person_name,
                '&#225;&#224;&#228;&#226;&#227;&#229;&#233;&#232;&#235;&#234;&#237;&#236;&#239;&#238;&#243;&#242;&#246;&#244;&#245;&#250;&#249;&#252;&#251;&#231;&#193;&#192;&#196;&#194;&#195;&#197;&#201;&#200;&#203;&#202;&#205;&#204;&#207;&#206;&#211;&#210;&#214;&#212;&#213;&#218;&#217;&#220;&#219;&#199;',
                'aaaaaaeeeeiiiioooouuuucAAAAAAEEEEIIIIOOOOOUUUUC') AS name_without_accents
    FROM source_table
    WHERE person_name ~ '&#225;&#224;&#228;&#226;&#227;&#229;&#233;&#232;&#235;&#234;&#237;&#236;&#239;&#238;&#243;&#242;&#246;&#244;&#245;&#250;&#249;&#252;&#251;&#231;&#193;&#192;&#196;&#194;&#195;&#197;&#201;&#200;&#203;&#202;&#205;&#204;&#207;&#206;&#211;&#210;&#214;&#212;&#213;&#218;&#217;&#220;&#219;&#199;'
) AS translated_table ON source_table.person_id = translated_table.person_id;

或者，按照以下方式创建一个视图：

CREATE VIEW all_names AS
SELECT
    person_id
  , prson_name
  , TRANSLATE(person_name,
            '&#225;&#224;&#228;&#226;&#227;&#229;&#233;&#232;&#235;&#234;&#237;&#236;&#239;&#238;&#243;&#242;&#246;&#244;&#245;&#250;&#249;&#252;&#251;&#231;&#193;&#192;&#196;&#194;&#195;&#197;&#201;&#200;&#203;&#202;&#205;&#204;&#207;&#206;&#211;&#210;&#214;&#212;&#213;&#218;&#217;&#220;&#219;&#199;',
            'aaaaaaeeeeiiiioooouuuucAAAAAAEEEEIIIIOOOOOUUUUC') AS name_without_accents
FROM source_table;

英文:

I'm unaware of any special function for this, but translate can be used. See Translate

SELECT
    source_table.person_name
  , translated_table.name_without_accents
FROM source_table
JOIN (
    SELECT
        person_id
      , TRANSLATE(person_name,
                &#39;&#225;&#224;&#228;&#226;&#227;&#229;&#233;&#232;&#235;&#234;&#237;&#236;&#239;&#238;&#243;&#242;&#246;&#244;&#245;&#250;&#249;&#252;&#251;&#231;&#193;&#192;&#196;&#194;&#195;&#197;&#201;&#200;&#203;&#202;&#205;&#204;&#207;&#206;&#211;&#210;&#214;&#212;&#213;&#218;&#217;&#220;&#219;&#199;&#39;,
                &#39;aaaaaaeeeeiiiioooouuuucAAAAAAEEEEIIIIOOOOOUUUUC&#39;) AS name_without_accents
    FROM source_table

          /* regex to locate the names containing accented chars */
    WHERE person_name ~ &#39;[&#225;&#224;&#228;&#226;&#227;&#229;&#233;&#232;&#235;&#234;&#237;&#236;&#239;&#238;&#243;&#242;&#246;&#244;&#245;&#250;&#249;&#252;&#251;&#231;&#193;&#192;&#196;&#194;&#195;&#197;&#201;&#200;&#203;&#202;&#205;&#204;&#207;&#206;&#211;&#210;&#214;&#212;&#213;&#218;&#217;&#220;&#219;&#199;]&#39;

    ) AS translated_table ON source_table.person_id = translated_table.person_id;

Or, create a view along these lines

CREATE VIEW all_names AS
SELECT
    person_id
  , prson_name
  , TRANSLATE(person_name,
            &#39;&#225;&#224;&#228;&#226;&#227;&#229;&#233;&#232;&#235;&#234;&#237;&#236;&#239;&#238;&#243;&#242;&#246;&#244;&#245;&#250;&#249;&#252;&#251;&#231;&#193;&#192;&#196;&#194;&#195;&#197;&#201;&#200;&#203;&#202;&#205;&#204;&#207;&#206;&#211;&#210;&#214;&#212;&#213;&#218;&#217;&#220;&#219;&#199;&#39;,
            &#39;aaaaaaeeeeiiiioooouuuucAAAAAAEEEEIIIIOOOOOUUUUC&#39;) AS name_without_accents
FROM source_table

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Redshift SQL查询 – 两个具有不同字符的数据源：例如François与Francois

问题

答案1

如何在我的 @Transactional 方法中只更新已更改的字段？

1个表上的2列具有相同的连接。

如何在Java中从字符串中去除重音符号？

SQL窗口函数，按日期比较值

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论