如何在Java中从字符串中去除重音符号?

huangapple go评论77阅读模式
英文:

How to remove acute accents from string in java?

问题

我知道这个

public static String stripAccents(String s) {
    s = Normalizer.normalize(s, Normalizer.Form.NFD);
    s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
    return s;
}

但它的效果不是我想要的。它改变了文本的意思

stripAccents("йод,ëлка,wäre") //иод,елка,ware

我只想删除重音符号

stripAccents("café") //cafe
英文:

I know about this

public static String stripAccents(String s) {
    s = Normalizer.normalize(s, Normalizer.Form.NFD);
    s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
    return s;
}

but it works not the way I want. It changes the sense of text

stripAccents("йод,ëлка,wäre") //иод,елка,ware

I want to delete only acute accents

stripAccents("café") //cafe

答案1

得分: 2

仅针对重音符号:

s = Normalizer.normalize(s, Normalizer.Form.NFD); // 分解
s = s.replace("\u0301", ""); // 组合式的锐音符号(´)
s = Normalizer.normalize(s, Normalizer.Form.NFC); // 重新组合

组合形式较短,并且在字体中通常表示更清晰。

即使不使用正则表达式,这也会去除零长度的锐音符号。

对于意大利语中的 cafè,使用重音符号̀(accent grave),请使用 \u0300

英文:

Just for the acute accents:

s = Normalizer.normalize(s, Normalizer.Form.NFD); // Decompose
s = s.replace("\u0301", ""); // Combining acute accent (´)
s = Normalizer.normalize(s, Normalizer.Form.NFC); // Compose again

Composing being the shortest, and often better represented in fonts.

This removes the zero length acute accents, even without regex.

For Italian cafè, accent grave, use \u0300.

答案2

得分: 1

好的,以下是您要求的翻译内容:

似乎最好的方法就是将带有重音符号的特定字符重新映射为普通字母:

public static String stripAccents(String s) {
   
    if (null == s || s.isEmpty()) {
        return s;
    }
    
    final String[] map = {
        "ÁÉÍÓÚÝáéíóúý",
        "AEIOUYaeiouy"
    };
    
    return s.chars()
            .mapToObj(c -> (char)(map[0].indexOf(c) > -1 ? map[1].charAt(map[0].indexOf(c)) : c))
            .collect(Collector.of(
                StringBuilder::new, StringBuilder::append, 
                StringBuilder::append, StringBuilder::toString
            ));
}

// 或者在 JDK 12 中使用更新的 switch 语句
public static String stripAcuteAccents(String s) {
    if (null == s || s.isEmpty()) {
        return s;
    }
    char[] raw = s.toCharArray();
    for (int i = 0; i < raw.length; i++) {
        raw[i] = switch(raw[i]) {
            case 'Á' -> 'A'; case 'É' -> 'E'; case 'Í' -> 'I';
            case 'Ó' -> 'O'; case 'Ú' -> 'U'; case 'Ý' -> 'Y'; 
            case 'á' -> 'a'; case 'é' -> 'e'; case 'í' -> 'i';
            case 'ó' -> 'o'; case 'ú' -> 'u'; case 'ý' -> 'y';
            default -> raw[i];
        };
    }
    return new String(raw);
}

基本测试:

String[] tests = {"café", "Á Toi", "ÁÉÍÓÚÝáéíóúý - bcdef"};
   
Arrays.stream(tests)
      .forEach(s -> System.out.printf("%s -> %s%n", s, stripAccents(s)));

输出:

café -> cafe
Á Toi -> A Toi
ÁÉÍÓÚÝáéíóúý - bcdef -> AEIOUYaeiouy - bcdef
英文:

It seems that it's better to just remap the specific set of accented characters with acute accent into plain letters:

public static String stripAccents(String s) {
   
    if (null == s || s.isEmpty()) {
        return s;
    }
    
    final String[] map = {
        &quot;&#193;&#201;&#205;&#211;&#218;&#221;&#225;&#233;&#237;&#243;&#250;&#253;&quot;,
        &quot;AEIOUYaeiouy&quot;
    };
    
    return s.chars()
            .mapToObj(c -&gt; (char)(map[0].indexOf(c) &gt; -1 ? map[1].charAt(map[0].indexOf(c)) : c))
            .collect(Collector.of(
                StringBuilder::new, StringBuilder::append, 
                StringBuilder::append, StringBuilder::toString
            ));
}

// or using updated switch statement in JDK 12
public static String stripAcuteAccents(String s) {
    if (null == s || s.isEmpty()) {
        return s;
    }
    char[] raw = s.toCharArray();
    for (int i = 0; i &lt; raw.length; i++) {
        raw[i] = switch(raw[i]) {
            case &#39;&#193;&#39; -&gt; &#39;A&#39;; case &#39;&#201;&#39; -&gt; &#39;E&#39;; case &#39;&#205;&#39; -&gt; &#39;I&#39;;
            case &#39;&#211;&#39; -&gt; &#39;O&#39;; case &#39;&#218;&#39; -&gt; &#39;U&#39;; case &#39;&#221;&#39; -&gt; &#39;Y&#39;; 
            case &#39;&#225;&#39; -&gt; &#39;a&#39;; case &#39;&#233;&#39; -&gt; &#39;e&#39;; case &#39;&#237;&#39; -&gt; &#39;i&#39;;
            case &#39;&#243;&#39; -&gt; &#39;o&#39;; case &#39;&#250;&#39; -&gt; &#39;u&#39;; case &#39;&#253;&#39; -&gt; &#39;y&#39;;
            default -&gt; raw[i];
        };
    }
    return new String(raw);
}

Basic tests:

String[] tests = {&quot;caf&#233;&quot;, &quot;&#193; Toi&quot;, &quot;&#193;&#201;&#205;&#211;&#218;&#221;&#225;&#233;&#237;&#243;&#250;&#253; - bcdef&quot;};
   
Arrays.stream(tests)
      .forEach(s -&gt; System.out.printf(&quot;%s -&gt; %s%n&quot;, s, stripAccents(s)));

output

caf&#233; -&gt; cafe
&#193; Toi -&gt; A Toi
&#193;&#201;&#205;&#211;&#218;&#221;&#225;&#233;&#237;&#243;&#250;&#253; - bcdef -&gt; AEIOUYaeiouy - bcdef

huangapple
  • 本文由 发表于 2020年10月8日 19:11:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/64261318.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定