比较通过套接字传递的UTF8字符串

huangapple go评论75阅读模式
英文:

Comparing strings passed through socket UTF8

问题

我这里有一个有趣的问题。

首先,我有一个用Java编写的用户界面(UI)。在某一点上,该UI通过套接字连接到网络上的rpi4。然后使用.writeUTF(string)通过套接字发送数据。

在rpi4一侧,我运行一个简单的Python 3脚本。它的唯一目的是将通过套接字传输的任何内容输出出来,确实如此。但在输出之前,我使用recv.decode('utf-8')来解码字符串。

从Java端,我发送了"fillOpen"。

在Python端解码后,它打印出"fillOpen"。

问题是:

在Python脚本中对解码后的字符串执行字符串比较始终返回false。我已经设置如下:

Command = recv.decode('utf-8')
if Command == "fillOpen":
    # 进行操作

我还尝试过不解码字符串,然后与编码后的字符串进行比较。如下所示:

Command = recv
FillOpenCommand = ("fillOpen").encode('utf-8')
if FillOpenCommand == Command:
    # 进行操作

这些比较都不会返回true。

我已经阅读过Java的writeUTF是UTF8编码,但略有一些"不同"?

我能调整.writeUTF以便与Python 3解码器配合使用吗?在发送数据方面,是否有一种替代方法,可以在Python中解析数据,然后应用字符串比较?

谢谢你们。

英文:

I have an interesting problem here.

First I have a UI in Java. The UI at one point connects to a rpi4 on the network via a socket. From there data is sent over the socket using .writeUTF(string).

On the rpi4 side, I'm running a simple Python 3 script. Its sole purpose is to spit out anything that comes over the socket and it does. But before it does I use recv.decode('utf-8') to decode the string.

From Java I send "fillOpen"

In python after decoding it prints "fillOpen"

The issue:

Performing a string compare in the python script on the decoded string always results in false. I have set it up as such:

Command = recv.decode('utf-8')
If Command == "fillOpen":
  #Do work

I have also tried to not decode the string and compare to an encoded string. As such:

Command = recv
FillOpenCommand = 
("fillOpen").encode('utf-8')
If fillOpenCommand == Command:
#Do work

None of these comparisons result in true.

I have read that the Java writeUTF is a UTF8 encoding but slightly "different"?

Can I adjust the .writeUTF to work with the Python 3 decoder? Is there an alternative for sending data that can be parsed then have a string comp applied via Python that would work?

Thank you guys.

答案1

得分: 1

假设您正在使用 Java 的 DataOutput 接口中定义的 writeUTF 方法:

writeUTF 输出的数据以两个字节的长度信息开头。您可以跳过这部分,或者您可以使用它来确保您已经接收到了完整的消息。

最简单的方法是跳过这部分:

Command = recv[2:].decode('utf-8')

如果您的命令只是简单的 ASCII 文本,不包含诸如用户输入、表情符号、音乐符号之类的内容,那么这就足够了。否则,您仍然会面临问题。writeUTF 方法处理“代理对”字符的方式不符合“utf-8”标准,使用 decode('utf-8') 会引发 UnicodeDecodeError。如果我是您,在这种情况下我会停止使用 writeUTF,并开始使用生成标准 UTF-8 编码数据的方法。

英文:

Assuming you are using the writeUTF method as defined in the Java DataOutput interface:

The output from writeUTF starts with two bytes of length information. You can skip it or you can use it to make sure you have received a complete message.

The easiest thing to do is to skip it:

Command = recv[2:].decode('utf-8')

If your commands are simply ASCII and don't contain things like user input, emojis, musical notation, this is good enough. Otherwise, you still have a problem. The way writeUTF handles "surrogate pair" characters is not valid "utf-8", and decode('utf-8') will throw a UnicodeDecodeError. If I were you, in this case I would stop using writeUTF and start using methods that produce standard UTF-8 encoded data.

huangapple
  • 本文由 发表于 2020年4月4日 20:16:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/61027943.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定