为什么 `tr` 会将一个字符替换为两个字符?

huangapple go评论61阅读模式
英文:

Why is `tr` replacing one character with two?

问题

我正在使用 tr(GNU coreutils v8.32)将非基本拉丁字符转换为基本拉丁字符,但它替换了我没有告诉它的字符,或者替换了多于一个期望字符。

示例:

% echo é | tr é e
> ee

发生了什么事?

英文:

I am using tr (GNU coreutils v8.32) to transliterate non-basic-Latin characters into basic Latin, and it is replacing them with characters I didn't tell it to or more than one of the desired character.

Example:

% echo é | tr é e
> ee

What's going on?

答案1

得分: 2

é 有两个字节,可能这就是为什么 tr 生成了两个 e

您可以通过以下方式实现期望的效果:

echo 'é' | iconv -t ASCII//TRANSLIT
英文:

é has two bytes, maybe that's why tr produces two e.

You can achieve the epxected effect with :

echo 'é' | iconv -t ASCII//TRANSLIT

答案2

得分: 0

I think the issue is that tr is oriented to the transliteration of single bytes, but if you look at your é, you will see it is two bytes, plus a linefeed:

我认为问题在于 tr 只针对单个字节的音译,但如果你看你的 é,你会发现它是两个字节,加上一个换行符:

echo é | xxd
00000000: c3a9 0a ...

I think you need to look to sed which is oriented towards patterns, however long they may be:

我认为你需要使用 sed,它更偏向于处理模式,无论它们有多长:

echo éàéà | sed -e 's/é/elephant/g' -e 's/à/antelope/g'
elephantantelopeelephantantelope

英文:

I think the issue is that tr is oriented to the transliteration of single bytes, but if you look at your é, you will see it is two bytes, plus a linefeed:

echo é | xxd                                         
00000000: c3a9 0a                                  ...

I think you need to look to sed which is oriented towards patterns, however long they may be:

echo éàéà | sed -e 's/é/elephant/g' -e 's/à/antelope/g'
elephantantelopeelephantantelope

huangapple
  • 本文由 发表于 2023年5月21日 13:38:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76298443.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定