遇到在Bash中使用Iconv进行转换的问题。

huangapple go评论76阅读模式
英文:

having problem with converting in bash with Iconv

问题

iconv: illegal input sequence at position 0

如果我有一个用英语键盘布局写成的希伯来语字符串,我希望脚本返回相同的字符串,但使用希伯来语键盘布局。

我是一个新的bash用户,所以这可能是一个愚蠢的问题,但我真的找不到答案。

#!/bin/bash

read -p "Give me a word: " word

echo "$word" | iconv -t cp1255 | tr $(echo "[/קראטוןםפ][שדגכעיחלךף,זסבהנמצתץ./קראטוןםפ}{שדגכעיחלך:\"|זסבהנמצ><?@#$^&amp;~]" | iconv -t cp1255) "[qwertyuiop[]asdfghjkl;'\zxcvbnm,./QWERTYUIOP{}ASDFGHJKL:\"|ZXCVBNM<><>?@#$^&amp;~\`]"

echo "$word" | tr "[qwertyuiop[]asdfghjkl;'\zxcvbnm,./QWERTYUIOP{}ASDFGHJKL:\"|ZXCVBNM<><>?@#$^&amp;~\`]" $(echo "[/קראטוןםפ][שדגכעיחלךף,זסבהנמצתץ./קראטוןםפ}{שדגכעיחלך:\"|זסבהנמצ><?@#$^&amp;~]" | iconv -t cp1255) | iconv -t cp1255
英文:

iconv: illegal input sequence at position 0

If I have a string written in Hebrew language but in English keyboard layout, I want the script to return the same string but in Hebrew keyboard layout.

I'm a new bash user, so it might be a silly problem but I really can't find the answer.

#!/bin/bash

read -p &quot;Give me a word: &quot; word

echo &quot;$word&quot; | iconv -t cp1255 | tr $(echo &quot;[/&#39;\קראטוןםפ\]\[שדגכעיחלךף,\זסבהנמצתץ./&#39;\קראטוןםפ}{שדגכעיחלך:\&quot;|זסבהנמצ&gt;&lt;?@#$^&amp;~\]&quot; | iconv -t cp1255) &quot;[qwertyuiop\[\]asdfghjkl;&#39;\\zxcvbnm,./QWERTYUIOP{}ASDFGHJKL:\&quot;|ZXCVBNM&lt;&gt;?@#$^&amp;~\`]&quot;

echo &quot;$word&quot; | tr &quot;[qwertyuiop\[\]asdfghjkl;&#39;\\zxcvbnm,./QWERTYUIOP{}ASDFGHJKL:\&quot;|ZXCVBNM&lt;&gt;?@#$^&amp;~\`]&quot; $(echo &quot;[/&#39;\קראטוןםפ\]\[שדגכעיחלךף,\זסבהנמצתץ./&#39;\קראטוןםפ}{שדגכעיחלך:\&quot;|זסבהנמצ&gt;&lt;?@#$^&amp;~\]&quot;| iconv -t cp1255) | iconv -t cp1255

答案1

得分: 0

根据我理解你的问题,这只是一个字符映射,其中一个集合中的每个字符应该被第二个集合中的字符替换。tr(不幸的是)不支持多字节字符或区域设置。

GNU sed 支持 Unicode。使用 y 命令,你可以在字符之间进行转换:

$ echo 'קראטו' | LC_ALL=C.UTF-8 sed 'y/קראטו/qwert/'
qwert
英文:

As I understand your question, there is just a simple map of characters, where every character from one set should be replaced by a character from the second set. tr (sadly) does not support multi-byte characters nor locale.

GNU sed support Unicode. With y command, you can translate between characters:

$ echo &#39;קראטו&#39; | LC_ALL=C.UTF-8 sed &#39;y/קראטו/qwert/&#39;
qwert

huangapple
  • 本文由 发表于 2023年3月3日 22:53:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/75628586.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定