2020年10月8日 19:11:40go评论80阅读模式

英文:

How to remove acute accents from string in java?

问题

我知道这个

public static String stripAccents(String s) {
    s = Normalizer.normalize(s, Normalizer.Form.NFD);
    s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
    return s;
}

但它的效果不是我想要的。它改变了文本的意思

stripAccents("йод,ëлка,wäre") //иод,елка,ware

我只想删除重音符号

stripAccents("café") //cafe

英文:

I know about this

public static String stripAccents(String s) {
    s = Normalizer.normalize(s, Normalizer.Form.NFD);
    s = s.replaceAll(&quot;[\\p{InCombiningDiacriticalMarks}]&quot;, &quot;&quot;);
    return s;
}

but it works not the way I want. It changes the sense of text

stripAccents(&quot;йод,&#235;лка,w&#228;re&quot;) //иод,елка,ware

I want to delete only acute accents

stripAccents(&quot;caf&#233;&quot;) //cafe

答案1

得分: 2

仅针对重音符号：

s = Normalizer.normalize(s, Normalizer.Form.NFD); // 分解
s = s.replace("\u0301", ""); // 组合式的锐音符号（&#180;）
s = Normalizer.normalize(s, Normalizer.Form.NFC); // 重新组合

组合形式较短，并且在字体中通常表示更清晰。

即使不使用正则表达式，这也会去除零长度的锐音符号。

对于意大利语中的 cafè，使用重音符号̀（accent grave），请使用 \u0300。

英文:

Just for the acute accents:

s = Normalizer.normalize(s, Normalizer.Form.NFD); // Decompose
s = s.replace(&quot;\u0301&quot;, &quot;&quot;); // Combining acute accent (&#180;)
s = Normalizer.normalize(s, Normalizer.Form.NFC); // Compose again

Composing being the shortest, and often better represented in fonts.

This removes the zero length acute accents, even without regex.

For Italian cafè, accent grave, use \u0300.

答案2

得分: 1

好的，以下是您要求的翻译内容：

似乎最好的方法就是将带有重音符号的特定字符重新映射为普通字母：

public static String stripAccents(String s) {
   
    if (null == s || s.isEmpty()) {
        return s;
    }
    
    final String[] map = {
        "ÁÉÍÓÚÝáéíóúý",
        "AEIOUYaeiouy"
    };
    
    return s.chars()
            .mapToObj(c -> (char)(map[0].indexOf(c) > -1 ? map[1].charAt(map[0].indexOf(c)) : c))
            .collect(Collector.of(
                StringBuilder::new, StringBuilder::append, 
                StringBuilder::append, StringBuilder::toString
            ));
}

// 或者在 JDK 12 中使用更新的 switch 语句
public static String stripAcuteAccents(String s) {
    if (null == s || s.isEmpty()) {
        return s;
    }
    char[] raw = s.toCharArray();
    for (int i = 0; i < raw.length; i++) {
        raw[i] = switch(raw[i]) {
            case 'Á' -> 'A'; case 'É' -> 'E'; case 'Í' -> 'I';
            case 'Ó' -> 'O'; case 'Ú' -> 'U'; case 'Ý' -> 'Y'; 
            case 'á' -> 'a'; case 'é' -> 'e'; case 'í' -> 'i';
            case 'ó' -> 'o'; case 'ú' -> 'u'; case 'ý' -> 'y';
            default -> raw[i];
        };
    }
    return new String(raw);
}

基本测试：

String[] tests = {"café", "Á Toi", "ÁÉÍÓÚÝáéíóúý - bcdef"};
   
Arrays.stream(tests)
      .forEach(s -> System.out.printf("%s -> %s%n", s, stripAccents(s)));

输出：

café -> cafe
Á Toi -> A Toi
ÁÉÍÓÚÝáéíóúý - bcdef -> AEIOUYaeiouy - bcdef

英文:

It seems that it's better to just remap the specific set of accented characters with acute accent into plain letters:

public static String stripAccents(String s) {
   
    if (null == s || s.isEmpty()) {
        return s;
    }
    
    final String[] map = {
        &quot;&#193;&#201;&#205;&#211;&#218;&#221;&#225;&#233;&#237;&#243;&#250;&#253;&quot;,
        &quot;AEIOUYaeiouy&quot;
    };
    
    return s.chars()
            .mapToObj(c -&gt; (char)(map[0].indexOf(c) &gt; -1 ? map[1].charAt(map[0].indexOf(c)) : c))
            .collect(Collector.of(
                StringBuilder::new, StringBuilder::append, 
                StringBuilder::append, StringBuilder::toString
            ));
}

// or using updated switch statement in JDK 12
public static String stripAcuteAccents(String s) {
    if (null == s || s.isEmpty()) {
        return s;
    }
    char[] raw = s.toCharArray();
    for (int i = 0; i &lt; raw.length; i++) {
        raw[i] = switch(raw[i]) {
            case &#39;&#193;&#39; -&gt; &#39;A&#39;; case &#39;&#201;&#39; -&gt; &#39;E&#39;; case &#39;&#205;&#39; -&gt; &#39;I&#39;;
            case &#39;&#211;&#39; -&gt; &#39;O&#39;; case &#39;&#218;&#39; -&gt; &#39;U&#39;; case &#39;&#221;&#39; -&gt; &#39;Y&#39;; 
            case &#39;&#225;&#39; -&gt; &#39;a&#39;; case &#39;&#233;&#39; -&gt; &#39;e&#39;; case &#39;&#237;&#39; -&gt; &#39;i&#39;;
            case &#39;&#243;&#39; -&gt; &#39;o&#39;; case &#39;&#250;&#39; -&gt; &#39;u&#39;; case &#39;&#253;&#39; -&gt; &#39;y&#39;;
            default -&gt; raw[i];
        };
    }
    return new String(raw);
}

Basic tests:

String[] tests = {&quot;caf&#233;&quot;, &quot;&#193; Toi&quot;, &quot;&#193;&#201;&#205;&#211;&#218;&#221;&#225;&#233;&#237;&#243;&#250;&#253; - bcdef&quot;};
   
Arrays.stream(tests)
      .forEach(s -&gt; System.out.printf(&quot;%s -&gt; %s%n&quot;, s, stripAccents(s)));

output

caf&#233; -&gt; cafe
&#193; Toi -&gt; A Toi
&#193;&#201;&#205;&#211;&#218;&#221;&#225;&#233;&#237;&#243;&#250;&#253; - bcdef -&gt; AEIOUYaeiouy - bcdef

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Java中从字符串中去除重音符号？

问题

答案1

答案2

正则表达式提取 Java 中第二个斜杠和等号后的字符串

获取包含在<image>标签中的字符串，其值为RSS项 – Android

@ActiveProfiles和@TestPropertySource之间的区别是什么？

Hibernate/JPA – 仅在表存在时执行操作

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论