为什么使用打印流打印字节(一种数值数据类型),会输出字符?

huangapple go评论79阅读模式
英文:

Why does printing bytes (a numeric data type) using a print stream give me characters?

问题

为什么下面代码的编译结果是ASCII码值:GHI?我以为byte是一种数值数据类型?请参见下面的代码示例:

import java.io.*;

public class PrintStreamDemo {

   public static void main(String[] args) {
    
      byte c[] = {70, 71, 72, 73, 74, 75, 76};
      // 创建PrintStream对象
      PrintStream ps = new PrintStream(System.out);
    
      // 写入字节1-3
      ps.write(c, 1, 3);
    
      // 刷新流
      ps.flush();
  }
}
英文:

Why does the compilation of the code below results in an ASCII code value: GHI? I thought byte is a numerical data type? See code below:

import java.io.*;

public class PrintStreamDemo {

   public static void main(String[] args) {
    
      byte c[] = {70, 71, 72, 73, 74, 75, 76};
      // create printstream object
      PrintStream ps = new PrintStream(System.out);
    
      // write bytes 1-3
      ps.write(c, 1, 3);
    
      // flush the stream
      ps.flush();
  }
}

答案1

得分: 1

> 我以为 byte 是一种数值数据类型?

根据Java语言规范byte 确实是一种数值数据类型,char 也是:

> Java编程语言的类型分为两类:原始类型和引用类型。原始类型(§4.2)包括boolean类型和数值类型。数值类型包括整数类型byteshortintlongchar,以及浮点类型floatdouble

"char 代表一个字符" 的含义是一个有用的抽象,它的范围恰好是UTF-16字符的范围。但对于计算机来说,char 只是16个二进制位,而 byte 只是8个二进制位。它们的解释取决于解释器。

当你使用 System.out.println() 打印 byte 时,字节首先被转换为 int,并且该方法将它们解释为一个数字1,这就是为什么 System.out.println(70b) 输出 "70" 而不是 "G"。

另一方面,你运行这个程序的控制台将接收到的字节解释为UTF-8编码的字符序列。System.out,连接到你的 ps,将字节写入控制台。这就是为什么 70 被打印为 "G" 等。


1 如果你查看一下 println 实际执行的源代码,你会发现在最底层,它也调用了 write(byte[], int, int),只是没有写入字节 70。相反,它写入表示字符 "7" 和 "0" 的字节 55 和 48。

英文:

> I thought byte is a numerical data type?

As far as the Java Language Specification is concerned, byte is indeed a numeric data type, and so is char:

> The types of the Java programming language are divided into two categories: primitive types and reference types. The primitive types (§4.2) are the boolean type and the numeric types. The numeric types are the integral types byte, short, int, long, and char, and the floating-point types float and double.

The meaning that "char represents a character" is a useful abstraction, its range just so happens to be the range of a UTF-16 character. But to the computer, char is just 16 ones and zeroes, and byte is just 8 ones and zeroes. How they are interpreted is up to the interpreter.

When you use System.out.println() to print the byte, the byte first gets converted to int and the method interprets them as a number<sup>1</sup>, that's why System.out.println(70b) prints "70" rather than G.

On the other hand, the console that you are running this program interprets the bytes it receives as a string of characters encoded in UTF-8. System.out, to which your ps is connected to, writes the bytes to the console. This is why 70 is printed as "G" etc.

<hr>

<sup>1</sup> If you looked at the source code of what println actually does, you will see that at the lowest level, it also calls write(byte[], int, int), just not writing the byte 70. Rather, it writes the bytes 55 and 48, which represents the character "7" and "0".

答案2

得分: 0

首先,System.out 已经是一个 PrintStream。所以将其包装在另一个 PrintStream 中是没有意义的。

其次,在 Java 中,printwrite 不是相同的东西,即使在 PrintStream 上也是如此。

第三,尽管 byte 是 Java 中的数值类型,char 也是如此。因此,沿着“它是一个数字,所以必须按数字打印”的思路进行推理在 Java 中是不成立的。相反,您需要查看 API 规范的详细信息。


实际发生的情况是,PrintStream.write(byte[], int, int) 方法将数组中的字节写入底层的 OutputStream,而不执行任何格式化或字符集编码,正如该方法的 javadoc 中所描述的那样。

在许多字符集编码中,字节值 70、71 和 72 编码了字符 G、H 和 I。(代码 70、71 和 72 在 ASCII 中表示这些字母,并且许多编码“借用”了其前 128 个代码的 ASCII 编码。)

现在,假设您的 JVM 使用的默认字符集编码是上述之一... 因此您在控制台上看到 "GHI" 作为输出。

然而... 并非所有字符编码都是这样的。例如,如果您的平台默认编码设置为 16 位编码,那么字节 7071 实际上表示一个 Unicode 代码点 U+4647... 这是一个中文字符。


如果您改为使用 ps.write(c[i])ps.print(c[i]),您会得到相同的行为。这将分别调用 PrintStream.write(int)PrintStream.print(char)。这两者最终都将字节视为字符的表示<sup>1</sup>。

如果您想要使用 PrintStream 打印格式化为数字的字节,则需要将其强制转换为 int

ps.print((int) c[1]);

PrintStream.print(int) 方法会打印其参数的十进制表示。


<sup>1 - 以不同的方式。PrintStream.write(int) 简单地写入字节,不考虑编码。PrintStream.print(char) 应用正确的编码。根据您要打印的字节和字符编码,这两个调用可能会产生不同的结果。</sup>

英文:

First of all, System.out is already a PrintStream. So wrapping it in another PrintStream is pointless.

Secondly, print and write are not the same thing in Java. Not even on a PrintStream.

Thirdly, even though byte is a numeric type in Java, so is char. So any reasoning along the lines of "it is a number so it must be printed as a number" doesn't hold in Java. Instead, you need to look at the details of the API specification.


What is actually happening here is that PrintStream.write(byte[], int, int) is writing bytes from the array to the underlying OutputStream without doing any formatting or character set encoding, as described in the javadoc for this method.

In many character set encodings, the byte values 70, 71 and 72 encode the characters G, H and I. (The codes 70, 71 and 72 represent these letters in ASCII, and many encodings "borrow" the ASCII encodings for their first 128 codes.)

Now, presumable, the default character set encoding used by your JVM is one of the above ... so you see "GHI" as output on the console.

However ... not all character encodings are like that. If your platform's default encoding was set to 16 bit encoding for example, the bytes 70 and 71 would actually represent a single Unicode codepoint U+4647 ... which is a Chinese character.


If you use ps.write(c[i]) or ps.print(c[i]) instead, you get the same behavior. This will be calling respectively PrintStream.write(int) or PrintStream.print(char). And both end up treating the byte as a representation of a character<sup>1</sup>.

If you want to print a byte formatted as a number using PrintStream, then you need to cast it to an int:

  ps.print((int) c[1]);

The PrintStream.print(int) method prints a decimal representation of its argument.


<sup>1 - In different ways. PrintStream.write(int) simply writes the byte without any respect to the encoding. PrintStream.print(char) applies the correct encoding. Depending on byte you are printing and the character encoding, these two calls may give different results.</sup>

答案3

得分: 0

Byte和char是数值数据类型,这并不意味着它们类似于int或float... Byte以字节格式存储数据。

要从字节数组中读取数据,使用ByteArrayInputStream。

import java.io.*;

public class PrintStreamDemo {

   public static void main(String[] args) {

      byte c[] = {70, 71, 72, 73, 74, 75, 76};

      // 创建新的字节数组输入流
      ByteArrayInputStream byt = new ByteArrayInputStream(c);

      // 打印数据
      int k = 0;
      while ((k = byt.read()) != -1) {
         // 将字节转换为字符
         char ch = (char) k;
         System.out.println("字符的ASCII值为:" + k + ";特殊字符为:" + ch);
      }

      // 刷新流
      byt.flush();
  }
}
英文:

Byte and char are numeric data types, that doesn't mean they are similar to int or float... Bytes store data in Byte format.

To read data from a byte array, use ByteArrayInputStream.

import java.io.*;

public class PrintStreamDemo {

   public static void main(String[] args) {
    
      byte c[] = {70, 71, 72, 73, 74, 75, 76};

      // Create the new byte array input stream  
      ByteArrayInputStream byt = new ByteArrayInputStream(buf);  
    
      // Print data
      int k = 0;  
      while ((k = byt.read()) != -1) {  
         //Conversion of a byte into character  
         char ch = (char) k;  
         System.out.println(&quot;ASCII value of Character is:&quot; + k + &quot;; Special character is: &quot; + ch);  
      }  
      
      // flush the stream
      byt.flush();
  }
}

huangapple
  • 本文由 发表于 2020年10月24日 12:28:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/64509986.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定