英文:
Comparing strings passed through socket UTF8
问题
我这里有一个有趣的问题。
首先,我有一个用Java编写的用户界面(UI)。在某一点上,该UI通过套接字连接到网络上的rpi4。然后使用.writeUTF(string)通过套接字发送数据。
在rpi4一侧,我运行一个简单的Python 3脚本。它的唯一目的是将通过套接字传输的任何内容输出出来,确实如此。但在输出之前,我使用recv.decode('utf-8')来解码字符串。
从Java端,我发送了"fillOpen"。
在Python端解码后,它打印出"fillOpen"。
问题是:
在Python脚本中对解码后的字符串执行字符串比较始终返回false。我已经设置如下:
Command = recv.decode('utf-8')
if Command == "fillOpen":
# 进行操作
我还尝试过不解码字符串,然后与编码后的字符串进行比较。如下所示:
Command = recv
FillOpenCommand = ("fillOpen").encode('utf-8')
if FillOpenCommand == Command:
# 进行操作
这些比较都不会返回true。
我已经阅读过Java的writeUTF是UTF8编码,但略有一些"不同"?
我能调整.writeUTF以便与Python 3解码器配合使用吗?在发送数据方面,是否有一种替代方法,可以在Python中解析数据,然后应用字符串比较?
谢谢你们。
英文:
I have an interesting problem here.
First I have a UI in Java. The UI at one point connects to a rpi4 on the network via a socket. From there data is sent over the socket using .writeUTF(string).
On the rpi4 side, I'm running a simple Python 3 script. Its sole purpose is to spit out anything that comes over the socket and it does. But before it does I use recv.decode('utf-8') to decode the string.
From Java I send "fillOpen"
In python after decoding it prints "fillOpen"
The issue:
Performing a string compare in the python script on the decoded string always results in false. I have set it up as such:
Command = recv.decode('utf-8')
If Command == "fillOpen":
#Do work
I have also tried to not decode the string and compare to an encoded string. As such:
Command = recv
FillOpenCommand =
("fillOpen").encode('utf-8')
If fillOpenCommand == Command:
#Do work
None of these comparisons result in true.
I have read that the Java writeUTF is a UTF8 encoding but slightly "different"?
Can I adjust the .writeUTF to work with the Python 3 decoder? Is there an alternative for sending data that can be parsed then have a string comp applied via Python that would work?
Thank you guys.
答案1
得分: 1
假设您正在使用 Java 的 DataOutput
接口中定义的 writeUTF
方法:
从 writeUTF
输出的数据以两个字节的长度信息开头。您可以跳过这部分,或者您可以使用它来确保您已经接收到了完整的消息。
最简单的方法是跳过这部分:
Command = recv[2:].decode('utf-8')
如果您的命令只是简单的 ASCII 文本,不包含诸如用户输入、表情符号、音乐符号之类的内容,那么这就足够了。否则,您仍然会面临问题。writeUTF
方法处理“代理对”字符的方式不符合“utf-8”标准,使用 decode('utf-8')
会引发 UnicodeDecodeError
。如果我是您,在这种情况下我会停止使用 writeUTF,并开始使用生成标准 UTF-8 编码数据的方法。
英文:
Assuming you are using the writeUTF
method as defined in the Java DataOutput
interface:
The output from writeUTF
starts with two bytes of length information. You can skip it or you can use it to make sure you have received a complete message.
The easiest thing to do is to skip it:
Command = recv[2:].decode('utf-8')
If your commands are simply ASCII and don't contain things like user input, emojis, musical notation, this is good enough. Otherwise, you still have a problem. The way writeUTF
handles "surrogate pair" characters is not valid "utf-8", and decode('utf-8')
will throw a UnicodeDecodeError
. If I were you, in this case I would stop using writeUTF and start using methods that produce standard UTF-8 encoded data.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论