处理 PowerShell 中的 Unicode

huangapple go评论61阅读模式
英文:

How to handle Unicode in PowerShell

问题

以下是您要翻译的代码部分的内容:

I was trying to print info of a video in Youtube using yt-dlp and want to do stuff with it in PowerShell

Command:

yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error

Output:

{'domain': 'youtube.com', 'uploader': '@AiraniIofifteen', 'status': 'was_live', 
'title': '【 Minecraft 】イオそらとえーちゃん仲良く大作戦【 iofi / hololive 】'}


but when i want to convert it using ConvertFrom-Json the Japanese characters and those【】gone.

Command:

yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error | ConvertFrom-Json

Output:

domain      uploader         status   title
------      --------         ------   -----
youtube.com @AiraniIofifteen was_live  Minecraft  iofi / hololive

Because when it's fine when i don't use ConvertFrom-Json i don't think it's yt-dlp problem. 
I also tried using some alternative way to do it and it doesn't seem to meet bright light.

Then I found out that do this (put previous command inside parenthesis)

(yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error)

Already throw Japanese characters and the【】

Output:

{'domain': 'youtube.com', 'uploader': '@AiraniIofifteen', 'status': 'was_live',
'title': ' Minecraft  iofi / hololive '}

I have searched many solution regarding this matter but nothing solved this problem.

Stuff like chcp, $OutputEncoding, [System.Console]::OutputEncoding, [System.Console]::InputEncoding, [System.Text.Encoding]::UTF8. Didn't solve my problem. Or maybe because i don't really understand these things so i can't solve my problem.

I almost certain this question is a duplicate question, but the thing is I am not familiar with this stuff and so i don't know what to search to solve this specific problem.

Few stuff I've tried:

$OutputEncoding = [System.Console]::OutputEncoding = [System.Console]::InputEncoding = [System.Text.Encoding]::UTF8
$PSDefaultParameterValues['*:Encoding'] = 'utf8'


$OutputEncoding = [Console]::OutputEncoding = [Console]::InputEncoding = (new-object System.Text.UTF8Encoding $false)


I also tried some other solution but i don't include it here because almost no way those solution will fix this problem
英文:

I was trying to print info of a video in Youtube using yt-dlp and want to do stuff with it in PowerShell

Command:

yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error

Output:

{'domain': 'youtube.com', 'uploader': '@AiraniIofifteen', 'status': 'was_live', 
'title': '【 Minecraft 】イオそらとえーちゃん仲良く大作戦【 iofi / hololive 】'}

but when i want to convert it using ConvertFrom-Json the Japanese characters and those【】gone.

Command:

yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error | ConvertFrom-Json

Output:

domain      uploader         status   title
------      --------         ------   -----
youtube.com @AiraniIofifteen was_live  Minecraft  iofi / hololive

Because when it's fine when i don't use ConvertFrom-Json i don't think it's yt-dlp problem.
I also tried using some alternative way to do it and it doesn't seem to meet bright light.

Then I found out that do this (put previous command inside parenthesis)

(yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error)

Already throw Japanese characters and the【】

Output:

{'domain': 'youtube.com', 'uploader': '@AiraniIofifteen', 'status': 'was_live',
'title': ' Minecraft  iofi / hololive '}

I have searched many solution regarding this matter but nothing solved this problem.

Stuff like chcp, $OutputEncoding, [System.Console]::OutputEncoding, [System.Console]::InputEncoding, [System.Text.Encoding]::UTF8. Didn't solve my problem. Or maybe because i don't really understand these things so i can't solve my problem.

I almost certain this question is a duplicate question, but the thing is I am not familiar with this stuff and so i don't know what to search to solve this specific problem.

Few stuff I've tried:

$OutputEncoding = [System.Console]::OutputEncoding = [System.Console]::InputEncoding = [System.Text.Encoding]::UTF8
$PSDefaultParameterValues['*:Encoding'] = 'utf8'
$OutputEncoding = [Console]::OutputEncoding = [Console]::InputEncoding = (new-object System.Text.UTF8Encoding $false)

I also tried some other solution but i don't include it here because almost no way those solution will fix this problem

答案1

得分: 2

我找到了解决方案。

在yt-dlp方面,我将标题格式更改为json并移除单引号。

"title": %(title)j

命令:

yt-dlp.exe --print "{ 'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': %(title)j}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error | ConvertFrom-Json

输出:

domain      uploader         status   title
------      --------         ------   -----
youtube.com @AiraniIofifteen was_live 【 Minecraft 】イオそらとえーちゃん仲良く大作戦【 iofi / hololive 】

但我认为问题仍未得到解答。为什么如果我传递带有日文字符和类似【】的字符串%(title)s到PowerShell中,它不能正确处理这个问题。

英文:

I found the solution.

In the yt-dlp side, I changed the title format to json and remove the single quotes.

'title': %(title)j

Command:

yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': %(title)j}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error | ConvertFrom-Json

Output:

domain      uploader         status   title
------      --------         ------   -----
youtube.com @AiraniIofifteen was_live 【 Minecraft 】イオそらとえーちゃん仲良く大作戦【 iofi / hololive 】

But I think the question is still unanswered. Why if I pass %(title)s (string with Japanese characters and something like【】) to anything in PowerShell it doesn't handle this properly.

答案2

得分: 1

这并不是一个完整的答案,但它对情况有了更多的解释:

问题出在 yt-dlp.exe 本身,这就是为什么配置 PowerShell 使用 UTF-8 也无法解决问题(在运行 chcp 65001 后从 cmd.exe 调用也没有帮助):

  • 在将输出 输出到控制台 时,yt-dlp.exe 能够 打印非ASCII范围的Unicode字符,如

  • yt-dlp.exe 的输出被 重定向 (例如通过管道传递到 ConvertFrom-Json)时,似乎所有这些字符都被简单地 删除

    • 文档 中描述了与Unicode相关的选项,如 +,但至少在2023.03.04版本中,我无法使它们工作。

因为你的 --print 参数恰好是JSON,使用格式类型 j 提供了一个有效的解决方案,因为标题的JSON表示使用Unicode 转义序列,如 \u3010 来表示非ASCII范围的Unicode字符,这意味着字符串由ASCII范围的字符组成,从而绕过了原始问题。

英文:

This isn't a complete answer either, but sheds some more light on the situation:

It is yt-dlp.exe itself that is the problem, which is why configuring PowerShell to use UTF-8 doesn't help (calling from cmd.exe after running chcp 65001 there doesn't help either):

  • When printing output to the console, yt-dlp.exe does print non-ASCII-range Unicode characters such as

  • When yt-dlp.exe's output is redirected (such as to ConvertFrom-Json via a pipeline), seemingly all such characters are simply removed.

    • The docs describe Unicode-related options such as +, but I couldn't get them to work, at least in version 2023.03.04.

Because your --print argument happens to be JSON, using format type j provides an effective solution, because the resulting JSON representation of the title uses Unicode escape sequences such as \u3010 to represent non-ASCII-range Unicode characters, which means that the string is composed of ASCII-range characters only, which bypasses the original problem.

huangapple
  • 本文由 发表于 2023年5月14日 19:16:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76247192.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定