2023年6月2日 01:27:59go评论63阅读模式

英文:

subprocess.run command with non-utf-8 characters (UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb)

问题

对不起，我无法处理代码部分的翻译。以下是你要翻译的非代码部分：

编码一直让我感到困惑，所以希望这不是一个完全愚蠢的问题。

我有一个Python脚本，调用metaflac来比较文件的FLAC指纹与文件的FLAC指纹。最近，我遇到了文件名中包含的»字符（https://bytetool.web.app/en/ascii/code/0xbb/）。这与我处理文件名字符串的方式失败了，所以我正在尝试解决这个问题。我首先想到的是需要将其处理为字节对象。但是，当我这样做然后调用subprocess.run时，我收到了UnicodeDecodeError错误。

以下是给我带来错误的代码片段：

def test():
    directory = b'&lt;redacted&gt;'
    ffp_open = open(directory + b'&lt;redacted&gt;.ffp','rb')
    ffp_lines = ffp_open.readlines()
    print(ffp_lines)
    for line in ffp_lines:
        if not line.startswith(b';') and b':' in line:
            txt = line.split(b':')

            ffp_cmd = b'/usr/bin/metaflac --show-md5sum \\' + directory + b'/' + txt[0]+ b'\\''
            print(ffp_cmd)
            get_ffp_process = subprocess.run(ffp_cmd, stdout=PIPE, stderr=PIPE, universal_newlines=True,shell=True)

对于这段代码，我得到了以下输出（为了更容易理解，进行了缩短）：

[b'01 - Intro.flac:eee7ca01db887168ce8312e7a3bdf8d6\r\n', b'04 - Song title \xbb Other Song \xbb.flac:98d2d03f47790d234052c6c9a2ca5cfd\r\n']
b"/usr/bin/metaflac --show-md5sum '&lt;redacted&gt;/01 - Intro.flac'"
b"/usr/bin/metaflac --show-md5sum '&lt;redacted&gt;/04 - Song title \xbb Other Song \xbb.flac'"

get_ffp_process = subprocess.run(ffp_cmd, stdout=PIPE, stderr=PIPE, universal_newlines=True,shell=True)
  File "&lt;redacted&gt;/python/lib/python3.9/subprocess.py", line 507, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "&lt;redacted&gt;/python/lib/python3.9/subprocess.py", line 1134, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "&lt;redacted&gt;/python/lib/python3.9/subprocess.py", line 2021, in _communicate
    stderr = self._translate_newlines(stderr,
  File "&lt;redacted&gt;/python/lib/python3.9/subprocess.py", line 1011, in _translate_newlines
    data = data.decode(encoding, errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 85: invalid start byte

如果我直接在命令行上运行它，它就可以正常工作（使用制表符填充文件名）：

metaflac --show-md5sum 04\ -\ Song\ title\ &#187;\ Other Song\ &#187;.flac 
98d2d03f47790d234052c6c9a2ca5cfd

通过nano查看的FFP文件如下：

01 - Intro.flac:eee7ca01db887168ce8312e7a3bdf8d6
04 - Song title � Other Song �.flac:98d2d03f47790d234052c6c9a2ca5cfd

我无法控制文件名，所以我尽量灵活地处理它们，这也是我认为字节对象最合适的原因。我会感激任何指导。谢谢！

英文:

Encoding honestly continues to confuse me, so hopefully this isn't a totally daft question.

I have a python script that calls metaflac to compare the flac fingerprints in a file to the flac fingerprints of a file. Recently I came across files with » (https://bytetool.web.app/en/ascii/code/0xbb/) in the file name. This failed with how I was dealing with the file name strings, so I'm trying to work around that. My first thought was that I needed to deal with this as bytes objects. But when I do that and then call subprocess.run, I get a UnicodeDecodeError

Here's the snippet of code that is give me errors:

def test():
    directory = b&#39;&lt;redacted&gt;&#39;
    ffp_open = open(directory + b&#39;&lt;redacted&gt;.ffp&#39;,&#39;rb&#39;)
    ffp_lines = ffp_open.readlines()
    print(ffp_lines)
    for line in ffp_lines:
        if not line.startswith(b&#39;;&#39;) and b&#39;:&#39; in line:
            txt = line.split(b&#39;:&#39;)

            ffp_cmd = b&#39;/usr/bin/metaflac --show-md5sum \&#39;&#39; + directory + b&#39;/&#39; + txt[0]+ b&#39;\&#39;&#39;
            print(ffp_cmd)
            get_ffp_process = subprocess.run(ffp_cmd, stdout=PIPE, stderr=PIPE, universal_newlines=True,shell=True)

With that, I get the following output (shortened to make more sense):

[b&#39;01 - Intro.flac:eee7ca01db887168ce8312e7a3bdf8d6\r\n&#39;, b&#39;04 - Song title \xbb Other Song \xbb.flac:98d2d03f47790d234052c6c9a2ca5cfd\r\n&#39;]
b&quot;/usr/bin/metaflac --show-md5sum &#39;&lt;redacted&gt;/01 - Intro.flac&#39;&quot;
b&quot;/usr/bin/metaflac --show-md5sum &#39;&lt;redacted&gt;/04 - Song title \xbb Other Song \xbb.flac&#39;&quot;

    get_ffp_process = subprocess.run(ffp_cmd, stdout=PIPE, stderr=PIPE, universal_newlines=True,shell=True)
  File &quot;&lt;redacted&gt;/python/lib/python3.9/subprocess.py&quot;, line 507, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File &quot;&lt;redacted&gt;/python/lib/python3.9/subprocess.py&quot;, line 1134, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File &quot;&lt;redacted&gt;/python/lib/python3.9/subprocess.py&quot;, line 2021, in _communicate
    stderr = self._translate_newlines(stderr,
  File &quot;&lt;redacted&gt;/python/lib/python3.9/subprocess.py&quot;, line 1011, in _translate_newlines
    data = data.decode(encoding, errors)
UnicodeDecodeError: &#39;utf-8&#39; codec can&#39;t decode byte 0xbb in position 85: invalid start byte

If I run this directly on the command line it works just fine (using tabs to fill in the file name):

metaflac --show-md5sum 04\ -\ Song\ title\ &#187;\ Other Song\ &#187;.flac 
98d2d03f47790d234052c6c9a2ca5cfd

The FFP file, through nano, looks like this:

01 - Intro.flac:eee7ca01db887168ce8312e7a3bdf8d6
04 - Song title � Other Song �.flac:98d2d03f47790d234052c6c9a2ca5cfd

I have no control over the file names, so I'm trying to be as flexible as possible to handle them, which is why I thought a bytes object would be best. I'd appreciate any direction. Thanks!

答案1

得分: 1

我相信使用"latin1"或"cp1252"的编码将成功解码。此外，处理字符串比处理字节更容易，所以这是我的建议：

import pathlib
import subprocess

directory = pathlib.Path("/tmp")

with open(directory / "data.ffp", "r", encoding="latin1") as stream:
    for line in stream:
        if line.startswith(";"):
            continue
        if ":" not in line:
            continue

        file_name, expected_md5sum = line.strip().split(":")
        print(f"{name=}")
        print(f"{expected_md5sum=}")
        command = [
            "/usr/bin/metaflac",
            "--show-md5sum",
            str(directory / file_name)
        ]
        print(f"{command=}")

        # 现在你可以运行该命令。我假设该命令将返回一个MD5校验和。
        completed_process = subprocess.run(
            command,
            encoding="latin1",
            capture_output=True,
        )

        # 现在，completed_process.stdout将以字符串形式保存输出，而不是字节。

这是一个示例输出：

name='04 - Song title » Other Song ».flac'
expected_md5sum='eee7ca01db887168ce8312e7a3bdf8d6\n'
command=['/usr/bin/metaflac', '--show-md5sum', '/tmp/01 - Intro.flac']
name='04 - Song title » Other Song ».flac'
expected_md5sum='98d2d03f47790d234052c6c9a2ca5cfd\n'
command=['/usr/bin/metaflac', '--show-md5sum', '/tmp/04 - Song title » Other Song ».flac']

由于我的系统没有metaflac命令，我无法进行测试。请谅解如果出现任何错误。如果发现错误，请在评论中发表，我会尝试修复它。

英文:

I believe coding of "latin1" or "cp1252" will do decode that successfully. Also, it is easier to deal with strings than with bytes, so here is my suggestion:

import pathlib
import subprocess

directory = pathlib.Path(&quot;/tmp&quot;)

with open(directory / &quot;data.ffp&quot;, &quot;r&quot;, encoding=&quot;latin1&quot;) as stream:
    for line in stream:
        if line.startswith(&quot;;&quot;):
            continue
        if &quot;:&quot; not in line:
            continue

        file_name, expected_md5sum = line.strip().split(&quot;:&quot;)
        print(f&quot;{name=}&quot;)
        print(f&quot;{expected_md5sum=}&quot;)
        command = [
            &quot;/usr/bin/metaflac&quot;,
            &quot;--show-md5sum&quot;,
            str(directory / file_name)
        ]
        print(f&quot;{command=}&quot;)

        # Now you can run the command. I assume that the command will return a MD5 sum back.
        completed_process = subprocess.run(
            command,
            encoding=&quot;latin1&quot;,
            capture_output=True,
        )

        # Now, completed_process.stdout will hold the output
        # as a string, not bytes.

Here is a sample output:

name=&#39;04 - Song title &#187; Other Song &#187;.flac&#39;
expected_md5sum=&#39;eee7ca01db887168ce8312e7a3bdf8d6\n&#39;
command=[&#39;/usr/bin/metaflac&#39;, &#39;--show-md5sum&#39;, &#39;/tmp/01 - Intro.flac&#39;]
name=&#39;04 - Song title &#187; Other Song &#187;.flac&#39;
expected_md5sum=&#39;98d2d03f47790d234052c6c9a2ca5cfd\n&#39;
command=[&#39;/usr/bin/metaflac&#39;, &#39;--show-md5sum&#39;, &#39;/tmp/04 - Song title &#187; Other Song &#187;.flac&#39;]

Since my system does not have the metaflac command, I cannot test it. Please forgive any error that come up. If an error found, please post in the comment and I will try to fix it.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

subprocess.run command with non-utf-8 characters (UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb)

问题

答案1

3D散点图显示为黑色窗口。

argparse在Python类中的验证

如何在KivyMD的MDTopAppBar的left_action_items中使用以及如何返回到前一个屏幕？

Python多进程回调

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论