2020年10月9日 21:32:00go评论74阅读模式

英文:

String comparision in UTF8

问题

我有一个PHP脚本，应该返回一个UTF-8编码的字符串。然而，在Java中，我似乎无法以任何方式与其内部字符串进行比较。

如果我打印 "OK" 和 response，在控制台中它们看起来是相同的。然而，如果我进行相等性检查

if ( "OK".equals(response) ) {

结果是false。我将两者都以二进制形式打印出来，response 是 11101111 10111011 10111111 01001111 01001011，然而 Java 的字符串 "OK" 则是 01001111 01001011，这显然是ASCII。我尝试过以几种方式将其转换为UTF8，但都无效：

String result2 = new String("OK".getBytes(StandardCharsets.UTF_8), StandardCharsets.UTF_8);

和

String result2 = new String("OK".getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);

都没有起作用，仍然返回ASCII码，原因不明。

byte[] result2 = "OK".getBytes(StandardCharsets.UTF_8); System.out.print(new String(result2));

虽然这也会给出正确的 "OK" 结果，但在二进制中仍然返回ASCII。

我尝试将通信更改为数字，但 1 仍然不等于 1，因为 Integer.parseInt(response) 返回 "1" 不是字符串错误消息，尽管在其他每个方面，它都被识别为普通字符串。

我正在寻找一个解决方案，最好是将 "OK" 转换为UTF-8，而不是将 response 转换为ASCII，因为我需要与一个设置为UTF-8的PHP脚本和两个数据库进行通信。Java 是通过开关 -Dfile.encoding=UTF8 启动的，以确保国际字符不会损坏。

英文:

I have a PHP script which is supposed to return an UTF-8 encoded string. However, in Java I can't seem to compare it with it's internal string in any way.

If I print "OK" and response, they appear the same in console. However, if I check equality

if ( "OK".equals(response) ) {

the result is false. I printed out both in binary, response is 11101111 10111011 10111111 01001111 01001011, the Java's String "OK" however is 01001111 01001011 which is cleary ASCII. I tried to convert it to UTF8 in a few ways, but no avail:

String result2 = new String("OK".getBytes(StandardCharsets.UTF_8), StandardCharsets.UTF_8);

and

String result2 = new String("OK".getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);

are both not working, still return ASCII codes for some reason.

byte[] result2 = "OK".getBytes(StandardCharsets.UTF_8); System.out.print(new String(result2));

While this also gives the correct "OK" result, in binary it still returns ASCII.

I've tried to change communication to numbers instead, but 1 still does not equal to 1, as Integer.parseInt(response) returns "1" is not a String error message, altough in every other aspect, it is recognised as a normal String.

I'm looking for a solution preferably where "OK" is converted to UTF-8 and not response to ASCII, since I need to communicate with a PHP script along with 2 databases, all set to UTF-8. Java is started with the switch -Dfile.encoding=UTF8 to ensure national characters are not broken.

答案1

得分: 4

在UTF-8中，所有编码为127或更低的字符都由一个字节编码。因此，在UTF-8和ASCII中，"OK"都是相同的两个字节。

11101111 10111011 10111111 01001111 01001011 不仅仅是简单的 "OK"，而是

0xEF，0xBB，0xBF，"OK"

其中 0xEF，0xBB，0xBF 是字节顺序标记（BOM，Byte order mark）

这些符号在编辑器中不显示，但用于确定编码。

可能这些符号出现在你的php脚本中，在 <?php 之前。

你需要配置你的编辑器以从文件中移除BOM。

更新

如果无法修改php脚本，可以使用以下解决方法：

// 检查响应的第一个符号是否为BOM
if (!response.isEmpty() &amp;&amp; (response.charAt(0) == 0xFEFF)) {
  // 删除第一个符号
  response = response.substring(1);
}

英文:

in UTF-8 all characters with codes 127 or less are encoded by a single byte. Therefore "OK" in UTF-8 and ASCII is the same two bytes.

11101111 10111011 10111111 01001111 01001011 it is not just simple "OK" it is

0xEF, 0xBB, 0xBF, "OK"

where 0xEF, 0xBB, 0xBF are a BOM (Byte order mark)

It is symbols which are not displayed by editors but used to determine the encoding.

Probably those symbols appeared in you php script before <?php

You have to configure your editor to remove BOM from the file

UPD

If it is not possible to alter the php script, you can use a workaround:

  // check if the first symbol of the response is BOM
  if (!response.isEmpty() &amp;&amp; (response.charAt(0) == 0xFEFF)) {
    // removing the first symbol
    response = response.substring(1);
  }

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

字符串比较在UTF-8中

问题

答案1

Spring Boot 重复的端点

Hibernate代理.toString惰性初始化异常

KeyCloak的getProvider()方法返回null。

Recieving an ActionListener as a constructor parameter and storing it so other methods in the class can add that action listener to buttons?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论