2020年9月25日 00:27:39go评论91阅读模式

英文:

Replace Unicode Characters in a String

问题

我需要将带有变音符号的字符（例如ä，ó等）替换为它们的“基本”字符。对于大多数字符，这个解决方案有效：

StringUtils.stripAccents(tmpStr);

但是这个方法漏掉了四个字符：æ，œ，ø，和ß。

我查看了这个解决方案，链接在这里：https://stackoverflow.com/questions/3322152/is-there-a-way-to-get-rid-of-accents-and-convert-a-whole-string-to-regular-lette。我认为第一个解决方案会起作用，但实际上并没有。

如何将这些字符替换为它们的“基本”字符（例如，将æ替换为a）。

英文:

I need to replace diacritic characters (e.g. ä, ó, etc.) with their 'base' character. For most of the characters, this solution works:

StringUtils.stripAccents(tmpStr);

but this misses four characters: æ, œ, ø, and ß.

I took a look at this solution here https://stackoverflow.com/questions/3322152/is-there-a-way-to-get-rid-of-accents-and-convert-a-whole-string-to-regular-lette. I figured the first solution would work, but it does not.

How can I replace these characters with their 'base' character (e.g. replace æ with a).

答案1

得分: 2

以下是您要求的翻译部分：

源代码如下（链接：https://commons.apache.org/proper/commons-lang/apidocs/src-html/org/apache/commons/lang3/StringUtils.html）：

public static String stripAccents(final String input) {
    if (input == null) {
        return null;
    }
    final StringBuilder decomposed = new StringBuilder(Normalizer.normalize(input, Normalizer.Form.NFD));
    convertRemainingAccentCharacters(decomposed);

    // 注意，这并未正确移除连字...
     
    return STRIP_ACCENTS_PATTERN.matcher(decomposed).replaceAll(EMPTY);
}

它有一个注释说：
// 注意，这并未正确移除连字...

因此您可能需要手动替换这些情况。类似于：

String string = Normalizer.normalize("Tĥïŝ ĩš â fůňķŷ ß æ œ ø Šťŕĭńġ", Normalizer.Form.NFKD);
string = string.replaceAll("\\p{M}", "");

string = string.replace("ß", "s");
string = string.replace("ø", "o");
string = string.replace("œ", "o");
string = string.replace("æ", "a");

变音字符到ASCII字符的映射
https://docs.oracle.com/cd/E29584_01/webhelp/mdex_basicDev/src/rbdv_chars_mapping.html

英文:

The source code says (https://commons.apache.org/proper/commons-lang/apidocs/src-html/org/apache/commons/lang3/StringUtils.html),

public static String stripAccents(final String input) {
    if (input == null) {
        return null;
    }        final StringBuilder decomposed = new StringBuilder(Normalizer.normalize(input, Normalizer.Form.NFD));        convertRemainingAccentCharacters(decomposed);        

    // Note that this doesn&#39;t correctly remove ligatures...   
 
    return STRIP_ACCENTS_PATTERN.matcher(decomposed).replaceAll(EMPTY);    
}

It has a comment that says,
// Note that this doesn't correctly remove ligatures...

So may be you need to manually replace those instances.
Something like,

    String string = Normalizer.normalize(&quot;Tĥ&#239;ŝ ĩš &#226; fůňķŷ &#223; &#230; œ &#248; Šťŕĭńġ&quot;, Normalizer.Form.NFKD);
    string = string.replaceAll(&quot;\\p{M}&quot;, &quot;&quot;);

    string = string.replace(&quot;&#223;&quot;, &quot;s&quot;);
    string = string.replace(&quot;&#248;&quot;, &quot;o&quot;);
    string = string.replace(&quot;œ&quot;, &quot;o&quot;);
    string = string.replace(&quot;&#230;&quot;, &quot;a&quot;);

Diacritical Character to ASCII Character Mapping
https://docs.oracle.com/cd/E29584_01/webhelp/mdex_basicDev/src/rbdv_chars_mapping.html

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

替换字符串中的Unicode字符

问题

答案1

错误：“The method setDefaultCloseOperation(int) is undefined for the type Frame”

在空的无限循环中进行选项检查 vs. 在执行某操作的无限循环中进行操作

链表是一种使用Java实现的数据结构，其中包含插入操作的命令。

Amount of Arrays created depends on input; How to debug?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论