英文:
How to get the `sort` shell command to compare raw bytes?
问题
It seems like you want the following text translated into Chinese without the code parts:
似乎 POSIX sort
命令行实用程序会执行一些基于区域设置的复杂比较操作,但我扫描了手册页面,似乎找不到一种以原始字节值方式进行排序的方法。是否有办法让 sort
(我使用的是 GNU coreutils 版本)的行为类似于 C
中的 qsort(array_of_my_strings, N, strcmp)
?如果使用除 sort
之外的其他工具来解决也可以。
举个例子,目前我得到的结果如下:
printf "\xC3\xBC\n\x76\n" | sort
ü
v
因为德国变音符 ü
似乎被比作 u,尽管 \xC3
大于 \x76
。
我想要的是:
printf "\xC3\xBC\n\x76\n" | sort --raw-bytes-please
v
ü
英文:
It seems like the posix sort
command line utility will do some fancy locale based shenanegans to compare the given strings.
I scanned the man page but could not seem to find a way to get it to use the raw byte values instead.
Is there a way to get sort
(I have the GNU coreutils version) to behave like
qsort(array_of_my_strings, N, strcmp)
would in C
? Solutions using another tool then sort
would be fine too.
For demonstration, I currently get:
printf "\xC3\xBC\n\x76\n" | sort
ü
v
because the german umlaut ü
seems to be compared as u which comes before v, despite \xC3
being larger than \x76
.
What i want is
printf "\xC3\xBC\n\x76\n" | sort --raw-bytes-please
v
ü
答案1
得分: 6
Collation order and (multi-byte) character type are influenced by your locale. The locale name for disabling multibyte and locale-aware behaviors is C
.
Thus:
LC_COLLATE=C LC_CTYPE=C sort
...will set only the character type and the collation order (assuming LC_ALL
isn't set, in which case they would be ignored).
As a big hammer, you can also use:
LC_ALL=C sort
albeit with side effects such as changing the language used for printing error messages &c to the strings originally written by sort
's developers with no translation tables in effect.
英文:
Collation order and (multi-byte) character type are influenced by your locale. The locale name for disabling multibyte and locale-aware behaviors is C
.
Thus:
LC_COLLATE=C LC_CTYPE=C sort
...will set only the character type and the collation order (assuming LC_ALL
isn't set, in which case they would be ignored).
As a big hammer, you can also use:
LC_ALL=C sort
albeit with side effects such as changing the language used for printing error messages &c to the strings originally written by sort
's developers with no translation tables in effect.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论