2020年8月14日 16:13:01go评论161阅读模式

英文:

Convert accent characters to english using java

问题

我有一个需求，需要搜索包含重音字符的用户，这些字符可能来自冰岛和日本。我编写的代码对一些重音字符有效，但不是全部。
以下是示例：

&#192; - 返回a。正确。
&#194; - 返回a。正确。
&#208; - 返回&#208;。这是错误的。应该返回e。
&#213; - 返回&#213;。这是错误的。应该返回o。

以下是我的代码：

String accentConvertStr = StringUtils.stripAccents(myKey);

也尝试了这个：

byte[] b = key.getBytes("Cp1252");
System.out.println("" + new String(b, StandardCharsets.UTF_8));

请建议。

英文:

I have a requirement where i need to search with accent characters that can be for users from Iceland and Japan. The code which i wrote works for a few accent characters but not all.
Below example -

&#192; - returns a. Correct.
&#194; - returns a. Correct.
&#208; - returns &#208;. This is breaking. It should return e.
&#213; - returns &#213;. This is breaking. It should return o.

Below is my code :-

String accentConvertStr = StringUtils.stripAccents(myKey);

Tried this too :-

byte[] b = key.getBytes(&quot;Cp1252&quot;);
System.out.println(&quot;&quot; + new String(b, StandardCharsets.UTF_8));

Please advise.

答案1

得分: 0

I would say it works as expected. The underlying code of StringUtils.stripAccents is actually following.

String[] chars  = new String[]{"À","Â","Ð","Õ"};

for(String c : chars){
  String normalized = Normalizer.normalize(c,Normalizer.Form.NFD);
  System.out.println(normalized.replaceAll("\\p{InCombiningDiacriticalMarks}+", ""));
}

This will output:
A
A
Ð
O

If you read https://stackoverflow.com/a/5697575/9671280 answer, you will find

Be aware that that will not remove what you might think of as “accent” marks from all characters! There are many it will not do this for. For example, you cannot convert Đ to D or ø to o that way. For that, you need to reduce code points to those that match the same primary collation strength in the Unicode Collation Table.

You could handle it separately if you still want to use StringUtil.stripAccents.

Please try https://github.com/xuender/unidecode it seems to work for your case.

 String normalized = Unidecode.decode(input);

英文:

I would say it works as expected. The underlying code of StringUtils.stripAccents is actually following.

String[] chars  = new String[]{&quot;&#192;&quot;,&quot;&#194;&quot;,&quot;&#208;&quot;,&quot;&#213;&quot;};

for(String c : chars){
  String normalized = Normalizer.normalize(c,Normalizer.Form.NFD);
  System.out.println(normalized.replaceAll(&quot;\\p{InCombiningDiacriticalMarks}+&quot;, &quot;&quot;));
}

This will output:
A
A
Ð
O

If you read https://stackoverflow.com/a/5697575/9671280 answer, you will find

You could handle it separately if you still want to use StringUtil.stripAccents.

Please try https://github.com/xuender/unidecode it seems to work for your case.

 String normalized = Unidecode.decode(input);

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Convert accent characters to English using Java.

问题

答案1

将具有属性List的有效负载列表转换为HashMap。

Getting "Module javafx.controls not found" error Java Eclipse IDE

尝试部署Presto，但当我尝试访问控制台时页面无法加载。

可以在多个线程中修改ArrayList中的项吗，如果这些线程从不修改相同的项？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论