英文:
Why is `tr` replacing one character with two?
问题
我正在使用 tr
(GNU coreutils v8.32)将非基本拉丁字符转换为基本拉丁字符,但它替换了我没有告诉它的字符,或者替换了多于一个期望字符。
示例:
% echo é | tr é e
> ee
发生了什么事?
英文:
I am using tr
(GNU coreutils v8.32) to transliterate non-basic-Latin characters into basic Latin, and it is replacing them with characters I didn't tell it to or more than one of the desired character.
Example:
% echo é | tr é e
> ee
What's going on?
答案1
得分: 2
é
有两个字节,可能这就是为什么 tr
生成了两个 e
。
您可以通过以下方式实现期望的效果:
echo 'é' | iconv -t ASCII//TRANSLIT
英文:
é
has two bytes, maybe that's why tr
produces two e
.
You can achieve the epxected effect with :
echo 'é' | iconv -t ASCII//TRANSLIT
答案2
得分: 0
I think the issue is that tr
is oriented to the transliteration of single bytes, but if you look at your é
, you will see it is two bytes, plus a linefeed:
我认为问题在于 tr
只针对单个字节的音译,但如果你看你的 é
,你会发现它是两个字节,加上一个换行符:
echo é | xxd
00000000: c3a9 0a ...
I think you need to look to sed
which is oriented towards patterns, however long they may be:
我认为你需要使用 sed
,它更偏向于处理模式,无论它们有多长:
echo éàéà | sed -e 's/é/elephant/g' -e 's/à/antelope/g'
elephantantelopeelephantantelope
英文:
I think the issue is that tr
is oriented to the transliteration of single bytes, but if you look at your é
, you will see it is two bytes, plus a linefeed:
echo é | xxd
00000000: c3a9 0a ...
I think you need to look to sed
which is oriented towards patterns, however long they may be:
echo éàéà | sed -e 's/é/elephant/g' -e 's/à/antelope/g'
elephantantelopeelephantantelope
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论