英文:
Sorting a list of strings by ignoring (not replacing) non-alphanumeric characters, or by looking at the first alphanumeric character
问题
基本上,我需要根据非常特定的标准对字符串列表进行排序,但这个标准并不是那么特定,我认为它不需要自己的比较器。
Collections.Sort
能够实现大部分自然排序,然而对于像是:
"-&4" 和 "%B",它会将 "%B" 排在 "-&4" 前面。
我想要的是按照第一个字母或数字字符进行排序,所以它会比较:
"4" 和 "B",将:
"-&4" 放在 "%B" 之前。
对含有特殊字符的字符串进行 replaceall
不能实现,因为我必须保留字符串的完整性,我尝试过先对所有字符进行替换,然后排序以生成排序位置,然后再尝试对未替换的列表进行重新排序,但未能成功(似乎也有些过于复杂)。
我已经花了过去的4个小时在谷歌上搜索这个问题,惊讶于这是一个如此新颖的情况。大多数解决方案都涉及对非字母数字字符进行 replaceall
,但我需要保留原始字符串的完整性。
如果这种措辞令人困惑,我深感抱歉。
英文:
Basically, I need to sort a list of Strings based on a very specific criteria, however, it's not so specific that I believe it needs its own comparator.
Collections.Sort gets me about 95% the way there as most of its natural sorting, however, for strings like:
"-&4" and "%B", it will prioritize "%B" over "-&4".
What I'd like is it to be sorted on the first alphanumeric character, so it would be comparing:
"4" and "B", putting:
"-&4" first then "%B".
Doing a replaceall on special characters can't really work because I have to retain the integrity of the string, and I went down a rabbit hole of replacing all, sorting to generate a sort position then try to re-sort the non-replaced list to no avail (also seems overkill).
I've spent the past 4 hours googling this and surprised it's such a novel situation. Most solutions come with a replaceall on non-alphanumeric characters, but I'd need to retain the integrity of the original string.
Apologies if this is confusing verbiage as well.
答案1
得分: 1
它并不是那么具体,我认为它不需要自己的比较器
如果您没有提供
Comparator
,字符串将按照它们的自然顺序进行排序。由于这不是您想要的,您肯定需要提供一个比较器,而且由于没有内建的比较器完全符合您的要求,所以您需要提供一个自定义比较器。下面的代码使用辅助方法和lambda表达式或方法引用创建了一个自定义比较器。仅仅因为您没有创建自己实现
Comparator
接口的类,不意味着您没有创建自己的比较器。
要按照只有字母数字字符进行排序,忽略空格和特殊字符,可以像这样做:
List<String> list = ... Pattern p = Pattern.compile("[^\\p{L}\\p{N}]+"); list.sort(Comparator.comparing(s -> p.matcher(s).replaceAll("")));
如果列表很大,您可能希望通过缓存排序所使用的规范化字符串来提高性能。
List<String> list = ... Pattern p = Pattern.compile("[^\\p{L}\\p{N}]+"); Map<String, String> normalized = list.stream() .collect(Collectors.toMap(s -> s, s -> p.matcher(s).replaceAll(""), (a, b) -> a)); list.sort(Comparator.comparing(normalized::get));
正则表达式解释
\p{L}
匹配所有 Unicode 类别 中的字符“Letter”。\p{N}
匹配所有 Unicode 类别中的“Number”字符。[^\p{L}\p{N}]
匹配所有不是“Letter”或“Number”的字符。"[^\\p{L}\\p{N}]+"
是匹配一个或多个这些字符的Java编码字面值。
英文:
> it's not so specific that I believe it needs its own comparator
If you don't supply a Comparator
, the strings are sorted by their natural order. Since that's not what you want, you definitely need to supply a comparator, and since there is no built-in comparator doing exactly what you want, you do need to supply a custom comparator.
The code below create a custom comparator using a helper method, and a lambda expression or a method reference. Just because you don't create your own class implementing Comparator
, doesn't mean you're not creating your own comparator.
To sort by only alphanumeric characters, ignoring spaces and special characters, you can do it like this:
List<String> list = ...
Pattern p = Pattern.compile("[^\\p{L}\\p{N}]+");
list.sort(Comparator.comparing(s -> p.matcher(s).replaceAll("")));
If the list is large, you'd likely want to improve performance by caching the normalized string that the sort is using.
List<String> list = ...
Pattern p = Pattern.compile("[^\\p{L}\\p{N}]+");
Map<String, String> normalized = list.stream()
.collect(Collectors.toMap(s -> s, s -> p.matcher(s).replaceAll(""), (a, b) -> a));
list.sort(Comparator.comparing(normalized::get));
Regex explained
\p{L}
matches all characters in Unicode category "Letter".\p{N}
matches all characters in Unicode category "Number".[^\p{L}\p{N}]
matches all characters that are not "Letter" or "Number"."[^\\p{L}\\p{N}]+"
is the Java encoded literal matching one or more of those characters.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论