字符串转字节和逆转不返回相同的结果(ASCII)

huangapple go评论77阅读模式
英文:

String to Byte Conversion and Back Again Not Returning Same Result (ASCII)

问题

我在将一个字符串转换为字节数组后,遇到了一些问题,无法将其正确转换回适当的值。

初始字符串:

"0000000000Y        Yã"

其中 ã 是一个字符值。

转换代码:

byte[] b = s.getBytes(StandardCharsets.US_ASCII);

然而,在尝试将其转换回字符串时:

String str = new String(b, StandardCharsets.US_ASCII);

我得到:

"0000000000Y        Y?"

有人知道为什么会出现这种情况吗?

谢谢。

英文:

I'm having a few issues converting a string back to the appropriate value after it has been converted to bytes.

The initial string:

"0000000000Y        Yã"

Where the 'ã' is just a character value.

The conversion code:

byte[] b = s.getBytes(StandardCharsets.US_ASCII);

However when using to convert it back:

String str = new String(b, StandardCharsets.US_ASCII);

I recieve:

"0000000000Y        Y?"

Anyone know why this is?

Thanks.

答案1

得分: 2

ã不是ASCII字符,因此它的处理方式取决于实现

此链接提供了详细信息:https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#getBytes-java.nio.charset.Charset-

此方法总是使用此字符集的默认替换字节数组替换格式错误的输入和无法映射的字符序列。

对于这个字符集,它被替换为?

英文:

ã is not an ASCII character, so how it is handled is given by the implementation

https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#getBytes-java.nio.charset.Charset-

>This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array.

For this charset it comes out as '?'

答案2

得分: 1

ã 不属于 US_ASCII 字符集。

getBytes() 方法的文档中写着:

> 此方法总是使用此字符集的默认替换字节数组替换格式错误的输入和不可映射字符序列。

(参见文档

对于 US_ASCII,默认的替换字节数组似乎是一个字节,代表着字符 '?'(ASCII 码为 0x3F)。因此,在字节数组中会插入这个替换字节,取代你的 ã 字符。

当转换回 String 时,你会得到对应于替换字节的字符,即 '?' 字符。

所以,如果你将其转换为字节,想要恢复相同的字符,请确保使用包含你打算使用的每个字符的字符集。一个安全的选择是 UTF-8。

如果你需要遵循某种字符编码(例如因为某些外部接口需要),那么Java的替换策略是有意义的,但当然会丢失一些字符。

英文:

ã is not part of the US_ASCII character set.

The getBytes() method is documented with:

> This method always replaces malformed-input and unmappable-character
> sequences with this charset's default replacement byte array.

(see the documentation)

For US_ASCII, the default replacement byte array seems to be one byte representing the '?' character (ASCII code 0x3F). So this is what gets inserted into the byte array in place of your ã character.

When converting back to String, you get the character corresponding to the replacement byte, being the '?' character.

So, if you convert to bytes, and you want to get back the identical characters, be sure to use a character set that contains every character you intend to use. A safe decision will be UTF-8.

If you need to obey some character encoding (e.g. because some external interface needs that), then Java's replacement strategy makes sense, but of course some characters will get lost.

答案3

得分: 0

这是因为 ã 不是一个 ASCII 字符。查看ASCII表以获取有效的ASCII字符。

英文:

This is because ã is not an ASCII character. Check an
ASCII table for valid ASCII characters.

huangapple
  • 本文由 发表于 2020年9月17日 18:03:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/63935728.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定