2020年7月25日 00:07:10go评论91阅读模式

英文:

Sorting a list of strings by ignoring (not replacing) non-alphanumeric characters, or by looking at the first alphanumeric character

问题

基本上，我需要根据非常特定的标准对字符串列表进行排序，但这个标准并不是那么特定，我认为它不需要自己的比较器。

Collections.Sort 能够实现大部分自然排序，然而对于像是：

"-&4" 和 "%B"，它会将 "%B" 排在 "-&4" 前面。

我想要的是按照第一个字母或数字字符进行排序，所以它会比较：

"4" 和 "B"，将：

"-&4" 放在 "%B" 之前。

对含有特殊字符的字符串进行 replaceall 不能实现，因为我必须保留字符串的完整性，我尝试过先对所有字符进行替换，然后排序以生成排序位置，然后再尝试对未替换的列表进行重新排序，但未能成功（似乎也有些过于复杂）。

我已经花了过去的4个小时在谷歌上搜索这个问题，惊讶于这是一个如此新颖的情况。大多数解决方案都涉及对非字母数字字符进行 replaceall，但我需要保留原始字符串的完整性。

如果这种措辞令人困惑，我深感抱歉。

英文:

Basically, I need to sort a list of Strings based on a very specific criteria, however, it's not so specific that I believe it needs its own comparator.

Collections.Sort gets me about 95% the way there as most of its natural sorting, however, for strings like:

"-&4" and "%B", it will prioritize "%B" over "-&4".

What I'd like is it to be sorted on the first alphanumeric character, so it would be comparing:

"4" and "B", putting:

"-&4" first then "%B".

Doing a replaceall on special characters can't really work because I have to retain the integrity of the string, and I went down a rabbit hole of replacing all, sorting to generate a sort position then try to re-sort the non-replaced list to no avail (also seems overkill).

I've spent the past 4 hours googling this and surprised it's such a novel situation. Most solutions come with a replaceall on non-alphanumeric characters, but I'd need to retain the integrity of the original string.

Apologies if this is confusing verbiage as well.

答案1

得分: 1

它并不是那么具体，我认为它不需要自己的比较器

如果您没有提供 Comparator，字符串将按照它们的自然顺序进行排序。由于这不是您想要的，您肯定需要提供一个比较器，而且由于没有内建的比较器完全符合您的要求，所以您需要提供一个自定义比较器。

下面的代码使用辅助方法和lambda表达式或方法引用创建了一个自定义比较器。仅仅因为您没有创建自己实现 Comparator 接口的类，不意味着您没有创建自己的比较器。

要按照只有字母数字字符进行排序，忽略空格和特殊字符，可以像这样做：
List<String> list = ...
Pattern p = Pattern.compile("[^\\p{L}\\p{N}]+");
list.sort(Comparator.comparing(s -> p.matcher(s).replaceAll("")));
如果列表很大，您可能希望通过缓存排序所使用的规范化字符串来提高性能。
List<String> list = ...
Pattern p = Pattern.compile("[^\\p{L}\\p{N}]+");
Map<String, String> normalized = list.stream()
		.collect(Collectors.toMap(s -> s, s -> p.matcher(s).replaceAll(""), (a, b) -> a));
list.sort(Comparator.comparing(normalized::get));
正则表达式解释

\p{L} 匹配所有 Unicode 类别中的字符“Letter”。

\p{N} 匹配所有 Unicode 类别中的“Number”字符。

[^\p{L}\p{N}] 匹配所有不是“Letter”或“Number”的字符。

"[^\\p{L}\\p{N}]+" 是匹配一个或多个这些字符的Java编码字面值。

英文:

> it's not so specific that I believe it needs its own comparator

If you don't supply a Comparator, the strings are sorted by their natural order. Since that's not what you want, you definitely need to supply a comparator, and since there is no built-in comparator doing exactly what you want, you do need to supply a custom comparator.

The code below create a custom comparator using a helper method, and a lambda expression or a method reference. Just because you don't create your own class implementing Comparator, doesn't mean you're not creating your own comparator.

To sort by only alphanumeric characters, ignoring spaces and special characters, you can do it like this:

List&lt;String&gt; list = ...
Pattern p = Pattern.compile(&quot;[^\\p{L}\\p{N}]+&quot;);
list.sort(Comparator.comparing(s -&gt; p.matcher(s).replaceAll(&quot;&quot;)));

If the list is large, you'd likely want to improve performance by caching the normalized string that the sort is using.

List&lt;String&gt; list = ...
Pattern p = Pattern.compile(&quot;[^\\p{L}\\p{N}]+&quot;);
Map&lt;String, String&gt; normalized = list.stream()
		.collect(Collectors.toMap(s -&gt; s, s -&gt; p.matcher(s).replaceAll(&quot;&quot;), (a, b) -&gt; a));
list.sort(Comparator.comparing(normalized::get));

Regex explained

\p{L} matches all characters in Unicode category "Letter".
\p{N} matches all characters in Unicode category "Number".
[^\p{L}\p{N}] matches all characters that are not "Letter" or "Number".
"[^\\p{L}\\p{N}]+" is the Java encoded literal matching one or more of those characters.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Sorting a list of strings by ignoring (not replacing) non-alphanumeric characters, or by looking at the first alphanumeric character

问题

答案1

Apache PdfBox如何设置字段字体大小

“active threads” 在 ThreadPoolExecutor 中的意思是什么？

Eclipse为什么会将M2_REPO添加到Java Build Path > Libraries > Classpath中？

Java – 实现compareTo() 方法用于比较自定义对象的 ArrayList

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。