英文:
BufferedReader stuck in readLine()
问题
我试图从Google Chrome获取HTTP请求以获取其数据。为此,我使用BufferedReader
的readLine()
方法,但由于某种原因,我认为它在最后一行处于阻塞状态,因为缓冲区保持打开状态,它会继续等待更多输入。这是我在while循环中使用的代码:
String line;
ArrayList<String> request = new ArrayList<String>();
while ((line = inFromClient.readLine()) != null) {
request.add(line);
}
如果我强制中断循环,它会正常工作,基本上我试图高效读取所有行,但又不希望使用ready()
方法出现不一致性。
英文:
I'm trying to get the HTTP request from Google Chrome to get it's data. For that I use readLine()
from BufferedReader
but for some reason I think it gets stuck at the last line because the buffer stays open and it stays waiting for more input. Here is the code that I use in the while loop:
String line;
ArrayList<String> request = new ArrayList<String>();
while ((line = inFromClient.readLine()) != null) {
request.add(line);
}
If I forcely break the loop it works, basically im trying to get an efficient read of all lines but without the inconsistencies of ready()
答案1
得分: 4
HTTP 看起来是一个相当简单的协议,但实际上并不是这样;你应该使用像内置的 java.net.http
客户端库这样的 HTTP 客户端库。
问题在于“给我数据,然后关闭连接”的概念是 HTTP/1.0 的,而这已经过时了几十年。HTTP/2.0 和 HTTP/3.0 都是二进制协议,而 HTTP/1.1 倾向于保持连接处于打开状态。总的来说,“逐行读取”,甚至是“使用 Reader”(例如,读取字符而不是字节)是错误的做法,因为 HTTP 不是文本协议。我知道,它看起来像是文本协议,但实际上并不是。
这里是一个非常简化的浏览器如何读取 HTTP/1.1 响应的概述:
- 使用原始字节处理,因为 HTTP 主体内容是原始的(或可以是原始的),因此将整个内容包装到例如
InputStreamReader
或BufferedReader
中是行不通的。 - 保持读取,直到读取到一个 0x0A 字节(在 ASCII 中,换行符号)或者已经读取了 X 字节,并且你的缓冲区已满,其中 X 不是特别大。不想因为糟糕的服务器行为或者误解而连接到不同(非 HTTP)服务导致内存问题!将第一行解析为 HTTP/1.1 响应。
- 保持循环执行此步骤以获取所有标头。使用相同的“我的缓冲区有限制”的技巧来避免内存问题。
- 然后检查响应代码,以确定是否会有主体内容。这是 HTTP/1.1,因此不能简单地去:“嗯,如果连接关闭了,我猜应该没有主体内容了”。是否会有主体内容取决于响应代码。
- 假设存在主体内容,读取分隔标头和主体的双换行符。
- 如果内容以分块编码传输(常见情况),开始将数据传输到缓冲区,但要检查是否已读取整个块。分块编码的读取是一种独立的处理方式。
- 或者,HTTP/1.1 要求如果未使用分块编码,则必须存在
Content-Length
标头。使用此标头来准确知道要读取多少字节。 - 在 HTTP/1.1 中,既不是“换行符”,也不是“关闭连接”可以作为有意义的“数据结束”标记,因此不要这样使用。
- 然后,要么将内容+标头+返回码逐字传递给请求代码,要么稍微整理一下。例如,如果存在
Content-Type
标头且其值为text/html; encoding=UTF-8
,则可以考虑使用 UTF-8 将主体数据转换为字符串(new String(byteArray, StandardCharsets.UTF_8);
)。
请注意,我已经忽略了一些服务器的奇怪行为,因为在古老的日子里,一些愚蠢的浏览器做了一些奇怪的事情,现在已经成为常态(例如,范围请求非常奇怪),当然还有完全不同的协议,即 HTTP2 和 HTTP3。
另外,当然,如今很少见到纯粹的 HTTP 服务器了;HTTPS 才是主流,而且这也是完全不同的。
英文:
HTTP seems like a crazy simple protocol but it is not; you should use an HTTP client library such as the built-in java.net.http
client.
The problem is that the concept of 'give me my data, then close it down' is HTTP/1.0, and that's a few decades out of date. HTTP/2.0 and HTTP/3.0 are binary protocols, and HTTP/1.1 tends to leave the connection open. In general, 'read lines', and even 'use Reader' (as in, read characters instead of bytes) is the wrong way to go about it, as HTTP is not a textual protocol. I know. It looks like one. It's not.
Here is a highly oversimplified overview of how e.g. a browser reads HTTP/1.1 responses:
- Use raw byte processing because HTTP body content is raw (or can be), therefore wrapping the whole thing into e.g. an
InputStreamReader
orBufferedReader
is a non-starter. - Keep reading until an 0x0A byte (in ASCII, the newline symbol), or X bytes have been read and your buffer for this is full, where X is not extraordinarily large. Wouldn't want a badly behaving server or a misunderstanding where you connect to a different (non-HTTP) service to cause a memory issue! Parse this first line as an HTTP/1.1 response.
- Keep doing this loop to pick up all headers. Use the same 'my buffer has limits' trick to avoid memory issues.
- Then check the response code in order to figure out if a body will be forthcoming. It's HTTP/1.1, so you can't just go: "Well, if the connection is closed, I guess no body is forthcoming". Whether one will be coming or not depends primarily on the response code.
- Assuming a body exists, read the double-newline that separates headers from the body.
- If the content is transfered as chunked encoding (common), start blitting data into a buffer, but check if you read the entire chunk. Reading chunked encoding is its own game, really.
- Alternatively, HTTP/1.1 DEMANDS that if chunked encoding isn't used that
Content-Length
is present. Use this header to know precisely how many bytes to read. - Neither 'a newline' nor 'close connection' can ever serve as a meaningful marker of 'end of data' in HTTP/1.1, so, don't.
- Then either pass the content+headers+returncode verbatim to the requesting code, or dress it up a bit. For example, if the
Content-Type
header is present and has valuetext/html; encoding=UTF-8
you can consider taking the body data and turning it into a string via UTF-8 (new String(byteArray, StandardCharsets.UTF_8);
).
Note that I've passed right over some bizarre behaviour that servers do because in ye olden days some dumb browser did weird things and it's now the status quo (for example, range requests are quite bizarre) and there's of course HTTP2 and HTTP3 which are completely different protocols.
Also, of course, HTTP servers are rare these days; HTTPS is where its at, and that's quite different too.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论