英文:
How to handle Unicode in PowerShell
问题
以下是您要翻译的代码部分的内容:
I was trying to print info of a video in Youtube using yt-dlp and want to do stuff with it in PowerShell
Command:
yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error
Output:
{'domain': 'youtube.com', 'uploader': '@AiraniIofifteen', 'status': 'was_live',
'title': '【 Minecraft 】イオそらとえーちゃん仲良く大作戦【 iofi / hololive 】'}
but when i want to convert it using ConvertFrom-Json the Japanese characters and those【】gone.
Command:
yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error | ConvertFrom-Json
Output:
domain uploader status title
------ -------- ------ -----
youtube.com @AiraniIofifteen was_live Minecraft iofi / hololive
Because when it's fine when i don't use ConvertFrom-Json i don't think it's yt-dlp problem.
I also tried using some alternative way to do it and it doesn't seem to meet bright light.
Then I found out that do this (put previous command inside parenthesis)
(yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error)
Already throw Japanese characters and the【】
Output:
{'domain': 'youtube.com', 'uploader': '@AiraniIofifteen', 'status': 'was_live',
'title': ' Minecraft iofi / hololive '}
I have searched many solution regarding this matter but nothing solved this problem.
Stuff like chcp, $OutputEncoding, [System.Console]::OutputEncoding, [System.Console]::InputEncoding, [System.Text.Encoding]::UTF8. Didn't solve my problem. Or maybe because i don't really understand these things so i can't solve my problem.
I almost certain this question is a duplicate question, but the thing is I am not familiar with this stuff and so i don't know what to search to solve this specific problem.
Few stuff I've tried:
$OutputEncoding = [System.Console]::OutputEncoding = [System.Console]::InputEncoding = [System.Text.Encoding]::UTF8
$PSDefaultParameterValues['*:Encoding'] = 'utf8'
$OutputEncoding = [Console]::OutputEncoding = [Console]::InputEncoding = (new-object System.Text.UTF8Encoding $false)
I also tried some other solution but i don't include it here because almost no way those solution will fix this problem
英文:
I was trying to print info of a video in Youtube using yt-dlp and want to do stuff with it in PowerShell
Command:
yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error
Output:
{'domain': 'youtube.com', 'uploader': '@AiraniIofifteen', 'status': 'was_live',
'title': '【 Minecraft 】イオそらとえーちゃん仲良く大作戦【 iofi / hololive 】'}
but when i want to convert it using ConvertFrom-Json the Japanese characters and those【】gone.
Command:
yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error | ConvertFrom-Json
Output:
domain uploader status title
------ -------- ------ -----
youtube.com @AiraniIofifteen was_live Minecraft iofi / hololive
Because when it's fine when i don't use ConvertFrom-Json i don't think it's yt-dlp problem.
I also tried using some alternative way to do it and it doesn't seem to meet bright light.
Then I found out that do this (put previous command inside parenthesis)
(yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error)
Already throw Japanese characters and the【】
Output:
{'domain': 'youtube.com', 'uploader': '@AiraniIofifteen', 'status': 'was_live',
'title': ' Minecraft iofi / hololive '}
I have searched many solution regarding this matter but nothing solved this problem.
Stuff like chcp, $OutputEncoding, [System.Console]::OutputEncoding, [System.Console]::InputEncoding, [System.Text.Encoding]::UTF8. Didn't solve my problem. Or maybe because i don't really understand these things so i can't solve my problem.
I almost certain this question is a duplicate question, but the thing is I am not familiar with this stuff and so i don't know what to search to solve this specific problem.
Few stuff I've tried:
$OutputEncoding = [System.Console]::OutputEncoding = [System.Console]::InputEncoding = [System.Text.Encoding]::UTF8
$PSDefaultParameterValues['*:Encoding'] = 'utf8'
$OutputEncoding = [Console]::OutputEncoding = [Console]::InputEncoding = (new-object System.Text.UTF8Encoding $false)
I also tried some other solution but i don't include it here because almost no way those solution will fix this problem
答案1
得分: 2
我找到了解决方案。
在yt-dlp方面,我将标题格式更改为json并移除单引号。
"title": %(title)j
命令:
yt-dlp.exe --print "{ 'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': %(title)j}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error | ConvertFrom-Json
输出:
domain uploader status title
------ -------- ------ -----
youtube.com @AiraniIofifteen was_live 【 Minecraft 】イオそらとえーちゃん仲良く大作戦【 iofi / hololive 】
但我认为问题仍未得到解答。为什么如果我传递带有日文字符和类似【】的字符串%(title)s
到PowerShell中,它不能正确处理这个问题。
英文:
I found the solution.
In the yt-dlp side, I changed the title format to json and remove the single quotes.
'title': %(title)j
Command:
yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': %(title)j}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error | ConvertFrom-Json
Output:
domain uploader status title
------ -------- ------ -----
youtube.com @AiraniIofifteen was_live 【 Minecraft 】イオそらとえーちゃん仲良く大作戦【 iofi / hololive 】
But I think the question is still unanswered. Why if I pass %(title)s
(string with Japanese characters and something like【】) to anything in PowerShell it doesn't handle this properly.
答案2
得分: 1
这并不是一个完整的答案,但它对情况有了更多的解释:
问题出在 yt-dlp.exe
本身,这就是为什么配置 PowerShell 使用 UTF-8 也无法解决问题(在运行 chcp 65001
后从 cmd.exe
调用也没有帮助):
-
在将输出 输出到控制台 时,
yt-dlp.exe
能够 打印非ASCII范围的Unicode字符,如イ
。 -
当
yt-dlp.exe
的输出被 重定向 (例如通过管道传递到ConvertFrom-Json
)时,似乎所有这些字符都被简单地 删除。- 文档 中描述了与Unicode相关的选项,如
+
,但至少在2023.03.04版本中,我无法使它们工作。
- 文档 中描述了与Unicode相关的选项,如
因为你的 --print
参数恰好是JSON,使用格式类型 j
提供了一个有效的解决方案,因为标题的JSON表示使用Unicode 转义序列,如 \u3010
来表示非ASCII范围的Unicode字符,这意味着字符串由ASCII范围的字符组成,从而绕过了原始问题。
英文:
This isn't a complete answer either, but sheds some more light on the situation:
It is yt-dlp.exe
itself that is the problem, which is why configuring PowerShell to use UTF-8 doesn't help (calling from cmd.exe
after running chcp 65001
there doesn't help either):
-
When printing output to the console,
yt-dlp.exe
does print non-ASCII-range Unicode characters such asイ
-
When
yt-dlp.exe
's output is redirected (such as toConvertFrom-Json
via a pipeline), seemingly all such characters are simply removed.- The docs describe Unicode-related options such as
+
, but I couldn't get them to work, at least in version 2023.03.04.
- The docs describe Unicode-related options such as
Because your --print
argument happens to be JSON, using format type j
provides an effective solution, because the resulting JSON representation of the title uses Unicode escape sequences such as \u3010
to represent non-ASCII-range Unicode characters, which means that the string is composed of ASCII-range characters only, which bypasses the original problem.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论