subprocess.run command with non-utf-8 characters (UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb)

huangapple go评论63阅读模式
英文:

subprocess.run command with non-utf-8 characters (UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb)

问题

对不起,我无法处理代码部分的翻译。以下是你要翻译的非代码部分:

编码一直让我感到困惑,所以希望这不是一个完全愚蠢的问题。

我有一个Python脚本,调用metaflac来比较文件的FLAC指纹与文件的FLAC指纹。最近,我遇到了文件名中包含的»字符(https://bytetool.web.app/en/ascii/code/0xbb/)。这与我处理文件名字符串的方式失败了,所以我正在尝试解决这个问题。我首先想到的是需要将其处理为字节对象。但是,当我这样做然后调用subprocess.run时,我收到了UnicodeDecodeError错误。

以下是给我带来错误的代码片段:

def test():
    directory = b'<redacted>'
    ffp_open = open(directory + b'<redacted>.ffp','rb')
    ffp_lines = ffp_open.readlines()
    print(ffp_lines)
    for line in ffp_lines:
        if not line.startswith(b';') and b':' in line:
            txt = line.split(b':')

            ffp_cmd = b'/usr/bin/metaflac --show-md5sum \\' + directory + b'/' + txt[0]+ b'\\''
            print(ffp_cmd)
            get_ffp_process = subprocess.run(ffp_cmd, stdout=PIPE, stderr=PIPE, universal_newlines=True,shell=True)

对于这段代码,我得到了以下输出(为了更容易理解,进行了缩短):

[b'01 - Intro.flac:eee7ca01db887168ce8312e7a3bdf8d6\r\n', b'04 - Song title \xbb Other Song \xbb.flac:98d2d03f47790d234052c6c9a2ca5cfd\r\n']
b"/usr/bin/metaflac --show-md5sum '<redacted>/01 - Intro.flac'"
b"/usr/bin/metaflac --show-md5sum '<redacted>/04 - Song title \xbb Other Song \xbb.flac'"

get_ffp_process = subprocess.run(ffp_cmd, stdout=PIPE, stderr=PIPE, universal_newlines=True,shell=True)
  File "<redacted>/python/lib/python3.9/subprocess.py", line 507, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "<redacted>/python/lib/python3.9/subprocess.py", line 1134, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "<redacted>/python/lib/python3.9/subprocess.py", line 2021, in _communicate
    stderr = self._translate_newlines(stderr,
  File "<redacted>/python/lib/python3.9/subprocess.py", line 1011, in _translate_newlines
    data = data.decode(encoding, errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 85: invalid start byte

如果我直接在命令行上运行它,它就可以正常工作(使用制表符填充文件名):

metaflac --show-md5sum 04\ -\ Song\ title\ »\ Other Song\ ».flac 
98d2d03f47790d234052c6c9a2ca5cfd

通过nano查看的FFP文件如下:

01 - Intro.flac:eee7ca01db887168ce8312e7a3bdf8d6
04 - Song title � Other Song �.flac:98d2d03f47790d234052c6c9a2ca5cfd

我无法控制文件名,所以我尽量灵活地处理它们,这也是我认为字节对象最合适的原因。我会感激任何指导。谢谢!

英文:

Encoding honestly continues to confuse me, so hopefully this isn't a totally daft question.

I have a python script that calls metaflac to compare the flac fingerprints in a file to the flac fingerprints of a file. Recently I came across files with » (https://bytetool.web.app/en/ascii/code/0xbb/) in the file name. This failed with how I was dealing with the file name strings, so I'm trying to work around that. My first thought was that I needed to deal with this as bytes objects. But when I do that and then call subprocess.run, I get a UnicodeDecodeError

Here's the snippet of code that is give me errors:

def test():
    directory = b'<redacted>'
    ffp_open = open(directory + b'<redacted>.ffp','rb')
    ffp_lines = ffp_open.readlines()
    print(ffp_lines)
    for line in ffp_lines:
        if not line.startswith(b';') and b':' in line:
            txt = line.split(b':')

            ffp_cmd = b'/usr/bin/metaflac --show-md5sum \'' + directory + b'/' + txt[0]+ b'\''
            print(ffp_cmd)
            get_ffp_process = subprocess.run(ffp_cmd, stdout=PIPE, stderr=PIPE, universal_newlines=True,shell=True)

With that, I get the following output (shortened to make more sense):

[b'01 - Intro.flac:eee7ca01db887168ce8312e7a3bdf8d6\r\n', b'04 - Song title \xbb Other Song \xbb.flac:98d2d03f47790d234052c6c9a2ca5cfd\r\n']
b"/usr/bin/metaflac --show-md5sum '<redacted>/01 - Intro.flac'"
b"/usr/bin/metaflac --show-md5sum '<redacted>/04 - Song title \xbb Other Song \xbb.flac'"

    get_ffp_process = subprocess.run(ffp_cmd, stdout=PIPE, stderr=PIPE, universal_newlines=True,shell=True)
  File "<redacted>/python/lib/python3.9/subprocess.py", line 507, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "<redacted>/python/lib/python3.9/subprocess.py", line 1134, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "<redacted>/python/lib/python3.9/subprocess.py", line 2021, in _communicate
    stderr = self._translate_newlines(stderr,
  File "<redacted>/python/lib/python3.9/subprocess.py", line 1011, in _translate_newlines
    data = data.decode(encoding, errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 85: invalid start byte

If I run this directly on the command line it works just fine (using tabs to fill in the file name):

metaflac --show-md5sum 04\ -\ Song\ title\ »\ Other Song\ ».flac 
98d2d03f47790d234052c6c9a2ca5cfd

The FFP file, through nano, looks like this:

01 - Intro.flac:eee7ca01db887168ce8312e7a3bdf8d6
04 - Song title � Other Song �.flac:98d2d03f47790d234052c6c9a2ca5cfd

I have no control over the file names, so I'm trying to be as flexible as possible to handle them, which is why I thought a bytes object would be best. I'd appreciate any direction. Thanks!

答案1

得分: 1

我相信使用"latin1"或"cp1252"的编码将成功解码。此外,处理字符串比处理字节更容易,所以这是我的建议:

import pathlib
import subprocess

directory = pathlib.Path("/tmp")

with open(directory / "data.ffp", "r", encoding="latin1") as stream:
    for line in stream:
        if line.startswith(";"):
            continue
        if ":" not in line:
            continue

        file_name, expected_md5sum = line.strip().split(":")
        print(f"{name=}")
        print(f"{expected_md5sum=}")
        command = [
            "/usr/bin/metaflac",
            "--show-md5sum",
            str(directory / file_name)
        ]
        print(f"{command=}")

        # 现在你可以运行该命令。我假设该命令将返回一个MD5校验和。
        completed_process = subprocess.run(
            command,
            encoding="latin1",
            capture_output=True,
        )

        # 现在,completed_process.stdout将以字符串形式保存输出,而不是字节。

这是一个示例输出:

name='04 - Song title » Other Song ».flac'
expected_md5sum='eee7ca01db887168ce8312e7a3bdf8d6\n'
command=['/usr/bin/metaflac', '--show-md5sum', '/tmp/01 - Intro.flac']
name='04 - Song title » Other Song ».flac'
expected_md5sum='98d2d03f47790d234052c6c9a2ca5cfd\n'
command=['/usr/bin/metaflac', '--show-md5sum', '/tmp/04 - Song title » Other Song ».flac']

由于我的系统没有metaflac命令,我无法进行测试。请谅解如果出现任何错误。如果发现错误,请在评论中发表,我会尝试修复它。

英文:

I believe coding of "latin1" or "cp1252" will do decode that successfully. Also, it is easier to deal with strings than with bytes, so here is my suggestion:

import pathlib
import subprocess

directory = pathlib.Path("/tmp")

with open(directory / "data.ffp", "r", encoding="latin1") as stream:
    for line in stream:
        if line.startswith(";"):
            continue
        if ":" not in line:
            continue

        file_name, expected_md5sum = line.strip().split(":")
        print(f"{name=}")
        print(f"{expected_md5sum=}")
        command = [
            "/usr/bin/metaflac",
            "--show-md5sum",
            str(directory / file_name)
        ]
        print(f"{command=}")

        # Now you can run the command. I assume that the command will return a MD5 sum back.
        completed_process = subprocess.run(
            command,
            encoding="latin1",
            capture_output=True,
        )

        # Now, completed_process.stdout will hold the output
        # as a string, not bytes.

Here is a sample output:

name='04 - Song title » Other Song ».flac'
expected_md5sum='eee7ca01db887168ce8312e7a3bdf8d6\n'
command=['/usr/bin/metaflac', '--show-md5sum', '/tmp/01 - Intro.flac']
name='04 - Song title » Other Song ».flac'
expected_md5sum='98d2d03f47790d234052c6c9a2ca5cfd\n'
command=['/usr/bin/metaflac', '--show-md5sum', '/tmp/04 - Song title » Other Song ».flac']

Since my system does not have the metaflac command, I cannot test it. Please forgive any error that come up. If an error found, please post in the comment and I will try to fix it.

huangapple
  • 本文由 发表于 2023年6月2日 01:27:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76384330.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定