英文:
BufferedInputStream reads the entire file once even though I set the buffer size to 1?
问题
我正在运行以下代码以尝试理解 Java 中 BufferedInputStream 的工作原理。我将缓冲区大小设置为 1,并期望缓冲区读取文件465次,因为文件中有465个字符。然而,它只读取了文件一次。我发现要改变缓冲区读取文件的次数,需要将字节数组的大小更改为1。在这种情况下,它会读取文件465次。我不明白为什么尽管我将缓冲区大小设置为1,但缓冲区为什么只读取文件一次。为什么数组"does"会决定缓冲区读取文件的次数?
File f = new File("runs");
if(!f.exists()) {
f.createNewFile();
}
FileInputStream input = new FileInputStream(f);
BufferedInputStream b = new BufferedInputStream(input, 1);
byte[] does = new byte[1000];
int i = b.read(does);
int x = 0;
String tmp;
while(i != -1) {
tmp = new String(does, StandardCharsets.UTF_8);
if(!tmp.equalsIgnoreCase("\n")) {
System.out.print(tmp);
} else {
System.out.println(tmp);
}
x++;
i = b.read(does);
}
System.out.println(x);
英文:
I was running the code below to try and understand how BufferedInputStream works in Java. I set the buffer size to 1 and was expecting the buffer to read the file 465 times because that is how much character is in the file. However, it reads the file once. What I found to change the number of times the buffer reads the file, you change the array of bytes, does, size to 1. In this case it reads the file 465 times. I do not understand why buffer reads the file once even though I set the buffer size 1. How come the array "does" dictates how many times the buffer reads the file?
File f = new File("runs");
if(!f.exists()) {
f.createNewFile();
}
FileInputStream input = new FileInputStream(f);
BufferedInputStream b = new BufferedInputStream(input, 1);
byte[] does = new byte[1000];
int i = b.read(does);
int x = 0;
String tmp;
while(i != -1) {
tmp = new String(does, StandardCharsets.UTF_8);
if(!tmp.equalsIgnoreCase("\n")) {
System.out.print(tmp);
}else {
System.out.println(tmp);
}
x++;
i = b.read(does);
}
System.out.println(x);
}
答案1
得分: 1
BufferedInputStream
与InputStream
中的read()
方法的区别。
让我们首先来看 InputStream.read
,它从输入流中读取单个数据字节,并将其作为 int
值返回,会在两种情况下阻塞:检测到流的末尾或抛出异常。而 BufferedInputStream
为传递的输入流添加了缓冲。
不同之处在于 BufferedInputStream
以块的形式从底层输入流中读取数据,并将其存储在内部缓冲区中,因此当您调用 read()
方法时,它会从其缓冲区中返回下一个字节,因此差异在于数据调用开销的数量,BufferedInputStream
通过将对底层输入流的多个数据请求合并为较少的调用来减少这种开销。
为什么
BufferedInputStream
在指定缓冲区大小时仍然会读取整个文件?
实际上它不会,BufferedInputStream
在指定缓冲区大小时并不一定会将整个文件读入缓冲区,实际上它会将数据以块或块的形式从文件中读入缓冲区,每个块的大小最多为缓冲区的大小。文件被读取的次数取决于文件的大小和缓冲区的大小。在您的示例中,您将缓冲区大小指定为 1,这就是您每次获取 一个字节 的原因,如果您指定了类似以下的内容:
BufferedInputStream b = new BufferedInputStream(input, 1024);
则情况会有所不同。
英文:
> Differences of read()
method within BufferedInputStream and InputStream.
Let's begin by InputStream.read
which reads a single byte of data from the input stream and returns it as an int
value which will be blocked in 2 condition, the end of the stream is being detected or an exception is thrown.
While BufferedInputStream
adds buffering to the passed input stream.
The difference is BufferedInputStream
reads data from the underlying input stream in chunks and stores it in an internal buffer so when you call read()
method it returns the next byte from its buffer instead so the difference is amount of data call overhead in which the BufferedInputStream
reduce it by grouping multiple requests for data into a fewer calls from the underlying input stream.
> Why BufferedInputStream reads the entire file even when specifying buffer size ?
It will not actually, BufferedInputStream
does not necessarily read the entire file into the buffer even when a buffer size is specified actually it reads data from the file into the buffer in chunks or blocks whose size is at most the size of the buffer,The number of times the file is read depends on the size of the file and the size of the buffer used by, in you shared snippet you specified size of the buffer to 1 and thats the reason you get one byte at a time, which should different, in your case some thing like
<!-- begin snippet: js hide: false console: true babel: false -->
BufferedInputStream b = new BufferedInputStream(input, 1024);
答案2
得分: 0
请注意,BufferedInputStream
并不总是读取到它自己的内部缓冲区中。
如果查看构造函数的源代码,您将看到当您分配为 new BufferedInputStream(input, 1)
时,它在内部创建一个内部缓冲区 buf = new byte[1];
:
BufferedInputStream b = new BufferedInputStream(input, 1);
// 在内部导致 b.buf = new byte[1];
在您运行下面的代码行时,尚未从底层的 input
文件中读取任何内容:
byte[] does = new byte[1000];
int i = b.read(does);
如果查看 read(byte[] ba)
的源代码,您会发现它调用 BufferedInputStream.read(byte[] ba, int off, int len)
。通常情况下,如果内部的 buf
可用,那么该读取操作会使用内部的 buf
的内容,并将其复制到 ba
中。但在您的第一次读取中,buf
是空的,因此它使用大小为 buf.length / len
的缓冲区,这是最大的缓冲区。因此,这意味着 buf
要么被填充并且被复制到 ba
中,或者,就像在您的情况下(1000 >= 1),它填充了 does
并且不填充内部的 buf
。
因此,尽管您将内部缓冲区大小指定为1字节,但是首次访问底层文件的内容可能高达1000字节(如果来自input
的内容可用)。
顺便说一句,不应该像上面那样转换UTF8流,因为这样可能会在UTF-8字符编码的中途损坏输入流 - 显然应该使用Reader
而不是。
英文:
Note that BufferedInputStream
does not always read into it's own internal buffer.
If you view the source code of the constructor you will see that when you allocate as new BufferedInputStream(input, 1)
internally it creates a internal buffer buf = new byte[1];
:
BufferedInputStream b = new BufferedInputStream(input, 1);
// Internally causes b.buf = new byte[1];
Nothing has been read from underlying input
file at the point you run your next lines:
byte[] does = new byte[1000];
int i = b.read(does);
If you view the source of read(byte[]ba)
you will find it calls BufferedInputStream.read(byte[] ba, int off, int len)
. Normally that read would use the contents of internal buf
if available and copies buf
into ba
. But in your first read, buf
is empty and so it uses the buffer which is largest of buf.length / len
instead. So that means that buf
is either filled and is copied to ba
or, as in your case (1000 >= 1), so it fills does
and does not fill the internal buf
.
Thus the first access to the underlying file could be up to 1000 bytes (if available from input
) even though you have specified the internal buffer size as just 1 byte.
By the way you shouldn't convert UTF8 stream as above because you may be corrupting the input stream part way through a UTF-8 character encoding - obviously use Reader
instead.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论