Powershell脚本中的非ASCII字符

huangapple go评论72阅读模式
英文:

non-ASCII characters in Powershell scripts

问题

我需要在包含特殊非拉丁字符的文件结构中执行一些文件操作。
当我尝试使用这些字符中的任何一个时,PowerShell会崩溃。

例如,这个不起作用:

$TestPath = "C:\Examples\Folder_1ĀČ\"
$ExampleFileName = "Test.txt"

Copy-Item ($PSScriptRoot + "\" + $ExampleFileName) -Destination ($TestPath) -Force

但这个可以工作:

$TestPath = "C:\Examples\Folder_1AC\"
$ExampleFileName = "Test.txt"

Copy-Item ($PSScriptRoot + "\" + $ExampleFileName) -Destination ($TestPath) -Force

我尝试使用以下调试:

Write-Output $TestPath

在控制台中返回的结果是:

C:\Examples\Folder_1Ä€Ä\

是否可以使用包含这些字符的路径进行PowerShell操作?
我该如何做到这一点?

英文:

I need to do some file operations in a file structure that contains special non-latin characters.
Powershell crashes, when I try to use any of those characters.

For example, this doesn't work:

$TestPath = "C:\Examples\Folder_1ĀČ\"
$ExampleFileName = "Test.txt"

Copy-Item ($PSScriptRoot + "\" + $ExampleFileName)  -Destination ($TestPath) -Force

But this works:

$TestPath = "C:\Examples\Folder_1AC\"
$ExampleFileName = "Test.txt"

Copy-Item ($PSScriptRoot + "\" + $ExampleFileName)  -Destination ($TestPath) -Force

I tried debugging with

Write-Output $TestPath

And result returned in the console was:

C:\Examples\Folder_1Ä€Ä\

Is it possible to use powershell with paths containing these characters?
How can I do that?

答案1

得分: 1

看起来你在PowerShell中遇到了代码页的问题。
检查并切换到UTF-8。

可以参考这个链接:
StackOverflow: 更改PowerShell的默认输出编码为UTF-8

更新

正如你所写

我发现-encoding default对我的情况有效

Default 是你的系统代码页。
最简单的显示它的方法是执行:

chcp

=> 你的代码页是什么?

我猜你在你的问题中有一个小错误:

你写的:        C:\Examples\Folder_1Ä€Ä\
我期望的是:   C:\Examples\Folder_1ĀČ\

这是典型的字符代码转换问题,当使用本地ANSI代码页解释UTF8字符的二进制编码时会出现。 Ā => Ä€Č => ÄŒ

请注意,以下内容很重要:

  • 你的环境使用哪个代码页?
  • cmdlet的文本使用哪种字符编码(代码页)?
  • Powershell用于读取cmdlet文本和执行它的编码是什么?

根据你的信息更新,逻辑如下:

  1. 你可以成功使用 -encoding Default 执行脚本
    => 你的脚本已使用你的本地代码页存储。
  2. 不使用 -encoding Default 会导致“扩展”字符:
    => Powershell假定编码为UTF8,然后

    • 转换文件的二进制值以正确的UTF8方式(更改字符ĀČ的正确UTF8编码字符)
    • 但最终转换后的字符二进制表示会使用本地ANSI代码页进行解释。
      结果是 ĀČ,因为字符ĀČ在UTF8中是2字节编码的。

因此,你应该确保所有环境(包括GUI编辑器)和Powershell的默认设置都使用相同的代码页。

关于这一点

PowerShell现在跨平台,通过其PowerShell Core版本,默认情况下使用不带BOM的UTF-8编码,与类Unix平台一致。

(引用自上面的链接1)

我建议将所有内容迁移到UTF8,这样-encoding Default将与-encoding UTF8相同。

但务必对存储的文件/目录名称和内容进行简要测试,因为它们都是使用你的本地ANSI代码页编写的。

与此同时,你需要告诉Powershell,通过-encoding Default不要假设你的cmdlet是使用UTF8存储的。

如何在其他函数(如Copy-Item)中使用这种编码?

通过使用

mycmdlet.ps1 -encoding Default

你告诉Powershell使用你当前使用的本地ANSI代码页读取所有内容。因此,由命令处理的所有内容都将适合该代码页。当处理来自或离开cmdlet的内容(因为它被读取或写入)时,系统的代码页(本地ANSI)将被使用,而且一切都应该正常。

英文:

It looks as you have a hassle with your codepage in Powershell.
Check and switch to UTF-8.

Have a look at this:
StackOverflow: Changing PowerShell's default output encoding to UTF-8


Update

As you write

> I found out that -encoding default works for my case

Default is your system codepage.
The simplest way to display it, just execute:

chcp

=> What is your codepage?


I suppose you've a small typo in you question:

You write:        C:\Examples\Folder_1Ä€Ä\
I would expect:   C:\Examples\Folder_1ĀČ\

These are typical character code translation problems, when interpreting UTF8 character's binary encoding by a local ANSI codepage. Ā => Ä€ and Č => ÄŒ

Please note, it is of interest

  • Which codepage is using your environment?
  • Which character encoding (codepage) is using the cmdlet's text?
  • Which encoding is used by Powershell for reading the cmdlet's text and for executing it.

Seeing your info updates the logic says:

  1. As you can successfully execute the script with -encoding Default
    => Your script has been stored using your local codepage.

  2. As not using -encoding Default results in "extended" characters:
    => Powershell assumes UTF8 as encoding, and

    • converts the read binary values of the file to correct UTF8 (changing the characters ĀČ proper UTF8 coded characters)
    • but finally the converted characters binary representation is interpreted using the local ANSI codepage.
      The result is ĀČ, as the characters ĀČ are 2-byte-encoded in UTF8.

As a consequence you should take care that all your environments (also your GUI editor) and Powershell's default are set to the same codepage.

Regarding this

> PowerShell is now cross-platform, via its PowerShell Core edition, whose encoding - sensibly - defaults to BOM-less UTF-8, in line with Unix-like platforms.

(citation from link above)

I would suggest to migate everything to UTF8, so -encoding Default becomes the same as -encoding UTF8.
But be sure to do brief testing of your stored file-/directory-names and content, as currently they all are written using your local ANSI codepage.

In the meantime you have to tell Powershell, by -encoding Default not to assume your cmdlet is stored using UTF8.


> How do I use this encoding for other functions like Copy-Item?

By using

mycmdlet.ps1 -encoding Default

You tell Powershell to read everything with your currently used local ANSI codepage. So everything that is handled by the commands will fit to that.
Wenn something comes in or leaves the cmdlet processing (because it's read or written) the system's codepage (local ANSI) will be used and also there should be everything OK.

huangapple
  • 本文由 发表于 2023年7月20日 15:30:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76727597.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定