英文:
Perl problem with substituting UTF-8 string on windows
问题
我正在尝试在Windows 10上使用Perl命令行替换文本文件中的子字符串。
C:\Windows\System32\chcp 65001 & type test.txt | c:\Strawberry\perl\bin\perl -CSD -pe "use open ':std', ':encoding(UTF-8)'; binmode(STDOUT, ':utf8'); binmode(STDIN, ':encoding(utf8)'); s/__compare_loan__/So sánh sản phẩm cho vay/g"
文件 test.txt(保存为UTF-8):
Our benefit: __compare_loan__
输出:
> Active code page: 65001
> Our benefit: So sánh s?n ph?m cho vay
如果我在Perl脚本的开头添加 use utf8;
,我会得到:
> Active code page: 65001 Malformed UTF-8 character: \xe1\x6e\x68
> (unexpected non-continuation byte 0x6e, immediately after start byte
> 0xe1; need 3 bytes, got 1) at -e line 1. Malformed UTF-8 character
> (fatal) at -e line 1.
请问如何消除输出中的问号?
英文:
I am trying to substitute substrings in a text file with Perl on Windows 10 using command line.
C:\Windows\System32\chcp 65001 & type test.txt | c:\Strawberry\perl\bin\perl -CSD -pe "use open ':std', ':encoding(UTF-8)'; binmode(STDOUT, ':utf8'); binmode(STDIN, ':encoding(utf8)'); s/__compare_loan__/So sánh sản phẩm cho vay/g"
File test.txt (saved as UTF-8):
Our benefit: __compare_loan__
Output:
> Active code page: 65001
> Our benefit: So sánh s?n ph?m cho vay
If I add use utf8;
at the beginning of the Perl script, I get:
> Active code page: 65001 Malformed UTF-8 character: \xe1\x6e\x68
> (unexpected non-continuation byte 0x6e, immediately after start byte
> 0xe1; need 3 bytes, got 1) at -e line 1. Malformed UTF-8 character
> (fatal) at -e line 1.
Please any idea how do I get rid of the question marks in the output?
答案1
得分: 4
当您将use utf8;
添加到您的一行代码时,出现错误,这表明perl的参数是以CP-1252或类似的代码页提供的,而不是UTF-8(在CP-1252中,0xE1对应于á,0x6E对应于n,0x68对应于h)。
一个可移植的修复方法是使用字符转义,而不是直接包含非ASCII字符:
C:\Code\SO> chcp 65001 & type test.txt | perl -CSD -pe "s/__compare_loan__/So s\x{e1}nh s\x{1ea3}n ph\x{1ea9}m cho vay/g"
Active code page: 65001
Our benefit: So sánh sản phẩm cho vay
(在Strawberry Perl 5.32.1和标准的Windows 10命令提示符应用程序中测试通过)
请注意,使用-CSD
意味着您不需要所有那些use open
和binmode
的内容;这些都被-C
的参数隐含了。
英文:
That error when you add use utf8;
to your one liner suggests that the arguments to perl are being given in CP-1252 or a similar code page, not in UTF-8 (0xE1 in CP-1252 is á, 0x6E is n and 0x68 is h).
One portable fix is to use character escapes instead of trying to include the non-ascii characters directly:
C:\Code\SO> chcp 65001 & type test.txt | perl -CSD -pe "s/__compare_loan__/So s\x{e1}nh s\x{1ea3}n ph\x{1ea9}m cho vay/g"
Active code page: 65001
Our benefit: So sánh sản phẩm cho vay
(Tested with Strawberry Perl 5.32.1 and the standard Windows 10 Command Prompt application)
Note that using -CSD
means you don't need all that use open
and binmode
stuff; it's all implied by the arguments to -C
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论