英文:
How to remove acute accents from string in java?
问题
我知道这个
public static String stripAccents(String s) {
s = Normalizer.normalize(s, Normalizer.Form.NFD);
s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
return s;
}
但它的效果不是我想要的。它改变了文本的意思
stripAccents("йод,ëлка,wäre") //иод,елка,ware
我只想删除重音符号
stripAccents("café") //cafe
英文:
I know about this
public static String stripAccents(String s) {
s = Normalizer.normalize(s, Normalizer.Form.NFD);
s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
return s;
}
but it works not the way I want. It changes the sense of text
stripAccents("йод,ëлка,wäre") //иод,елка,ware
I want to delete only acute accents
stripAccents("café") //cafe
答案1
得分: 2
仅针对重音符号:
s = Normalizer.normalize(s, Normalizer.Form.NFD); // 分解
s = s.replace("\u0301", ""); // 组合式的锐音符号(´)
s = Normalizer.normalize(s, Normalizer.Form.NFC); // 重新组合
组合形式较短,并且在字体中通常表示更清晰。
即使不使用正则表达式,这也会去除零长度的锐音符号。
对于意大利语中的 cafè,使用重音符号̀(accent grave),请使用 \u0300。
英文:
Just for the acute accents:
s = Normalizer.normalize(s, Normalizer.Form.NFD); // Decompose
s = s.replace("\u0301", ""); // Combining acute accent (´)
s = Normalizer.normalize(s, Normalizer.Form.NFC); // Compose again
Composing being the shortest, and often better represented in fonts.
This removes the zero length acute accents, even without regex.
For Italian cafè, accent grave, use \u0300.
答案2
得分: 1
好的,以下是您要求的翻译内容:
似乎最好的方法就是将带有重音符号的特定字符重新映射为普通字母:
public static String stripAccents(String s) {
if (null == s || s.isEmpty()) {
return s;
}
final String[] map = {
"ÁÉÍÓÚÝáéíóúý",
"AEIOUYaeiouy"
};
return s.chars()
.mapToObj(c -> (char)(map[0].indexOf(c) > -1 ? map[1].charAt(map[0].indexOf(c)) : c))
.collect(Collector.of(
StringBuilder::new, StringBuilder::append,
StringBuilder::append, StringBuilder::toString
));
}
// 或者在 JDK 12 中使用更新的 switch 语句
public static String stripAcuteAccents(String s) {
if (null == s || s.isEmpty()) {
return s;
}
char[] raw = s.toCharArray();
for (int i = 0; i < raw.length; i++) {
raw[i] = switch(raw[i]) {
case 'Á' -> 'A'; case 'É' -> 'E'; case 'Í' -> 'I';
case 'Ó' -> 'O'; case 'Ú' -> 'U'; case 'Ý' -> 'Y';
case 'á' -> 'a'; case 'é' -> 'e'; case 'í' -> 'i';
case 'ó' -> 'o'; case 'ú' -> 'u'; case 'ý' -> 'y';
default -> raw[i];
};
}
return new String(raw);
}
基本测试:
String[] tests = {"café", "Á Toi", "ÁÉÍÓÚÝáéíóúý - bcdef"};
Arrays.stream(tests)
.forEach(s -> System.out.printf("%s -> %s%n", s, stripAccents(s)));
输出:
café -> cafe
Á Toi -> A Toi
ÁÉÍÓÚÝáéíóúý - bcdef -> AEIOUYaeiouy - bcdef
英文:
It seems that it's better to just remap the specific set of accented characters with acute accent into plain letters:
public static String stripAccents(String s) {
if (null == s || s.isEmpty()) {
return s;
}
final String[] map = {
"ÁÉÍÓÚÝáéíóúý",
"AEIOUYaeiouy"
};
return s.chars()
.mapToObj(c -> (char)(map[0].indexOf(c) > -1 ? map[1].charAt(map[0].indexOf(c)) : c))
.collect(Collector.of(
StringBuilder::new, StringBuilder::append,
StringBuilder::append, StringBuilder::toString
));
}
// or using updated switch statement in JDK 12
public static String stripAcuteAccents(String s) {
if (null == s || s.isEmpty()) {
return s;
}
char[] raw = s.toCharArray();
for (int i = 0; i < raw.length; i++) {
raw[i] = switch(raw[i]) {
case 'Á' -> 'A'; case 'É' -> 'E'; case 'Í' -> 'I';
case 'Ó' -> 'O'; case 'Ú' -> 'U'; case 'Ý' -> 'Y';
case 'á' -> 'a'; case 'é' -> 'e'; case 'í' -> 'i';
case 'ó' -> 'o'; case 'ú' -> 'u'; case 'ý' -> 'y';
default -> raw[i];
};
}
return new String(raw);
}
Basic tests:
String[] tests = {"café", "Á Toi", "ÁÉÍÓÚÝáéíóúý - bcdef"};
Arrays.stream(tests)
.forEach(s -> System.out.printf("%s -> %s%n", s, stripAccents(s)));
output
café -> cafe
Á Toi -> A Toi
ÁÉÍÓÚÝáéíóúý - bcdef -> AEIOUYaeiouy - bcdef
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论