英文:
Sending large data over TCP/IP socket
问题
Here's the translated code:
我有一个小项目,运行一个`C#`服务器和一个`Java`客户端。服务器将图像发送给客户端。
一些图像相当大(有时达到10MiB),因此我将图像字节拆分并以每个`32768字节`的块发送。
我的C#服务器代码如下:
using (var stream = new MemoryStream(ImageData))
{
for (int j = 1; j <= dataSplitParameters.NumberOfChunks; j++)
{
byte[] chunk;
if (j == dataSplitParameters.NumberOfChunks)
chunk = new byte[dataSplitParameters.FinalChunkSize];
else
chunk = new byte[dataSplitParameters.ChunkSize];
int result = stream.Read(chunk, 0, chunk.Length);
string line = DateTime.Now + ", 状态正常, " + ImageName+ ", 图像块, " + j + ", " + dataSplitParameters.NumberOfChunks + ", " + chunk.Length;
//写入读取参数
streamWriter.WriteLine(line);
streamWriter.Flush();
//写入数据
binaryWriter.Write(chunk);
binaryWriter.Flush();
Console.WriteLine(line);
string deliveryReport = streamReader.ReadLine();
Console.WriteLine(deliveryReport);
}
}
我的Java客户端代码如下:
long dataRead = 0;
for (int j = 1; j <= numberOfChunks; j++) {
String line = bufferedReader.readLine();
tokens = line.split(", ");
System.out.println(line);
int toRead = Integer.parseInt(tokens[tokens.length - 1]);
byte[] chunk = new byte[toRead];
int read = inputStream.read(chunk, 0, toRead);
//处理数据
dataRead += read;
String progressReport = pageLabel + ", 进度: " + dataRead + "/" + dataLength + " 字节。";
bufferedOutputStream.write((progressReport + "\n").getBytes());
bufferedOutputStream.flush();
System.out.println(progressReport);
}
问题是当我运行代码时,要么客户端崩溃并显示读取到伪数据的错误,要么客户端和服务器都挂起。这是错误信息:
Document Page 1, 进度: 49153/226604 字节。
�9��%>�YI!��F�����h�
Exception in thread "main" java.lang.NumberFormatException: For input string: .....
我做错了什么?
英文:
I have a small project running a server in C#
and a client in Java
. The server sends images to the client.
Some images are quite big (up to 10MiB sometimes), so I split the image bytes and send it in chunks of 32768 bytes
each.
My C# Server code is as follows:
using (var stream = new MemoryStream(ImageData))
{
for (int j = 1; j <= dataSplitParameters.NumberOfChunks; j++)
{
byte[] chunk;
if (j == dataSplitParameters.NumberOfChunks)
chunk = new byte[dataSplitParameters.FinalChunkSize];
else
chunk = new byte[dataSplitParameters.ChunkSize];
int result = stream.Read(chunk, 0, chunk.Length);
string line = DateTime.Now + ", Status OK, " + ImageName+ ", ImageChunk, " + j + ", " + dataSplitParameters.NumberOfChunks + ", " + chunk.Length;
//write read params
streamWriter.WriteLine(line);
streamWriter.Flush();
//write the data
binaryWriter.Write(chunk);
binaryWriter.Flush();
Console.WriteLine(line);
string deliveryReport = streamReader.ReadLine();
Console.WriteLine(deliveryReport);
}
}
And my Java Client code is as follows:
long dataRead = 0;
for (int j = 1; j <= numberOfChunks; j++) {
String line = bufferedReader.readLine();
tokens = line.split(", ");
System.out.println(line);
int toRead = Integer.parseInt(tokens[tokens.length - 1]);
byte[] chunk = new byte[toRead];
int read = inputStream.read(chunk, 0, toRead);
//do something with the data
dataRead += read;
String progressReport = pageLabel + ", progress: " + dataRead + "/" + dataLength + " bytes.";
bufferedOutputStream.write((progressReport + "\n").getBytes());
bufferedOutputStream.flush();
System.out.println(progressReport);
}
The problem is when I run the code, either the client crashes with an error saying it is reading bogus data, or both the client and the server hang. This is the error:
Document Page 1, progress: 49153/226604 bytes.
�9��%>�YI!��F�����h�
Exception in thread "main" java.lang.NumberFormatException: For input string: .....
What am I doing wrong?
答案1
得分: 1
基本问题。
一旦将输入流包装成缓冲读取器,您必须停止访问输入流。该缓冲读取器是有缓冲的,它会读取尽可能多的数据,不限于仅读取到下一个换行符并停止。
Java端的BufferedReader已经读取了比这更多的数据,因此它已经消耗了大量的图像数据,没有办法回退。通过创建那个BufferedReader,您使任务变得不可能,因此不能这样做。
潜在问题。
您只有一个TCP/IP连接。在此连接上,您发送一些无关紧要的文本(页面、进度等),然后发送未知数量的图像数据,然后再发送另一个无关的进度更新。
这是根本性的问题。图像解析器怎么可能知道在发送图像的中途,您收到一个状态更新行?文本也是二进制数据,没有任何魔法标识符可以让客户端知道:这个字节是图像数据的一部分,但这个字节是在进度信息之间发送的文本。
简单的修复方法。
您可能会认为简单的修复方法是...好吧,那就停止这样做!为什么要发送这个进度?客户端完全有能力知道它读取了多少字节,没有必要发送那个。只需...取出二进制数据,打开输出流,发送所有数据。在客户端上,打开输入流,读取所有数据。不要涉及字符串。不要使用任何与“使用字符一起工作”的东西(所以,BufferedReader?不行。BufferedInputStream可以)。
...但现在客户端不知道标题和总大小!
所以制定一种协议。它可以非常简单。
这是您的协议:
- 4字节,大端序:SizeOfName
- SizeOfName字节数。UTF-8编码的文档标题。
- 4字节,大端序:SizeOfData
- SizeOfData字节数。图像数据。
如果您实际上希望客户端能够呈现进度条并知道标题,那么就使用这个。如果不需要,那么什么都不要做,直接发送字节,并通过...关闭连接来信号传输文件已经完全发送。
这是一些示例Java代码:
try (InputStream in = ....) {
int nameSize = readInt(in);
byte[] nameBytes = in.readNBytes(nameSize);
String name = new String(nameBytes, StandardCharsets.UTF_8);
int dataSize = readInt(in);
try (OutputStream out =
Files.newOutputStream(Paths.get("/Users/TriSky/image.png"))) {
byte[] buffer = new byte[65536];
while (dataSize > 0) {
int r = in.read(buffer);
if (r == -1) throw new IOException("提前结束流");
out.write(buffer, 0, r);
dataSize -= r;
}
}
}
public int readInt(InputStream in) throws IOException {
byte[] b = in.readNBytes(4);
return ByteBuffer.wrap(b).getInt();
}
结束语
您的应用程序中的另一个错误是您正在使用错误的方法。Java的read(bytes)
方法不会(必要时)完全填充字节数组。read(byte[])
只会读取至少1个字节(除非流关闭,然后不读取任何内容,并返回-1。其思想是:read
将读取最佳数量的字节:正好是现在准备好给您的字节数。那有多少个?谁知道 - 如果您忽略in.read(bytes)
返回的值,您的代码必然是错误的,而您正在做的就是这样。您真正想要的是例如 readNBytes
,它保证完全填充字节数组(或直到流结束,以先到者为准)。
请注意,在上述的传输代码中,我也使用了基本的read
,但在这里我没有忽略返回值。
英文:
The basic problem.
Once you wrap an inputstream into a bufferedreader you must stop accessing the inputstream. That bufferedreader is buffered, it will read as much data as it wants to, it is NOT limited to reading exactly up to the next newline symbol(s) and stopping there.
The BufferedReader on the java side has read a lot more than that, so it's consumed a whole bunch of image data already, and there's no way out from here. By making that BufferedReader, you've made the job impossible, so you can't do that.
The underlying problem.
You have a single TCP/IP connection. On this, you send some irrelevant text (the page, the progress, etc), and then you send an unknown amount of image data, and then you send another irrelevant progress update.
That's fundamentally broken. How can an image parser possibly know that halfway through sending an image, you get a status update line? Text is just binary data too, there is no magic identifier that lets a client know: This byte is part of the image data, but this byte is some text sent in-between with progress info.
The simple fix.
You'd think the simple fix is.. well, stop doing that then! Why are you sending this progress? The client is perfectly capable of knowing how many bytes it read, there is no point sending that. Just.. take your binary data. open the outputstream. send all that data. And on the client side, open the inputstream, read all that data. Don't involve strings. Don't use anything that smacks of 'works with characters' (so, BufferedReader? No. BufferedInputStream is fine).
... but now the client doesn't know the title, nor the total size!
So make a wire protocol. It can be near trivial.
This is your wire protocol:
- 4 bytes, big endian: SizeOfName
- SizeOfName number of bytes. UTF-8 encoded document title.
- 4 bytes, big endian: SizeOfData
- SizeOfData number of bytes. The image data.
And that's if you actually want the client to be able to render a progress bar and to know the title. If that's not needed, don't do any of that, just straight up send the bytes, and signal that the file has been completely sent by.. closing the connection.
Here's some sample java code:
try (InputStream in = ....) {
int nameSize = readInt(in);
byte[] nameBytes = in.readNBytes(nameSize);
String name = new String(nameBytes, StandardCharsets.UTF_8);
int dataSize = readInt(in);
try (OutputStream out =
Files.newOutputStream(Paths.get("/Users/TriSky/image.png")) {
byte[] buffer = new byte[65536];
while (dataSize > 0) {
int r = in.read(buffer);
if (r == -1) throw new IOException("Early end-of-stream");
out.write(buffer, 0, r);
dataSize -= r;
}
}
}
public int readInt(InputStream in) throws IOException {
byte[] b = in.readNBytes(4);
return ByteBuffer.wrap(b).getInt();
}
Closing notes
Another bug in your app is that you're using the wrong method. Java's 'read(bytes)' method will NOT (neccessarily) fully fill that byte array. All read(byte[]) will do is read at least 1 byte (unless the stream is closed, then it reads none, and returns -1. The idea is: read will read the optimal number of bytes: Exactly as many as are ready to give you right now. How many is that? Who knows - if you ignore the returned value of in.read(bytes), your code is neccessarily broken, and you're doing just that. What you really want is for example readNBytes
which guarantees that it fully fills that byte array (or until stream ends, whichever happens first).
Note that in the transfer code above, I also use the basic read, but here I don't ignore the return value.
答案2
得分: 0
你的Java代码似乎在使用BufferedReader
。它将数据读入自己的缓冲区,这意味着数据不再在底层套接字输入流中可用 - 这是你的第一个问题。你在inputStream.read
的使用上还有第二个问题 - 它不能保证读取你请求的所有字节,你需要在其周围加上一个循环。
这不是一个特别容易解决的问题。当你在同一个流中混合二进制和文本数据时,很难读取回来。在Java中,有一个叫做DataInputStream的类可以帮助一点 - 它有一个readLine
方法来读取一行文本,还有用于读取二进制数据的方法:
DataInputStream dataInput = new DataInputStream(inputStream);
for (int j = 1; j <= numberOfChunks; j++) {
String line = dataInput.readLine();
...
byte[] chunk = new byte[toRead];
int read = dataInput.readFully(chunk);
...
}
DataInputStream 有一些限制:readLine
方法已弃用,因为它假定文本使用Latin-1编码,并且不允许使用不同的文本编码。如果你想更深入地解决这个问题,你可能需要创建自己的类来读取你的流格式。
有些图片相当大(有时高达10MiB),所以我将图像字节拆分成每个32768字节的块并发送。
你知道这完全是不必要的吧?将多兆字节的数据发送到TCP套接字没有任何问题,并且在接收端流式传输所有数据。
英文:
Your Java code seems to be using a BufferedReader
. It reads data into a buffer of its own, meaning it is no longer available in the underlying socket input stream - that's your first problem. You have a second problem with how inputStream.read
is used - it's not guaranteed to read all the bytes you ask for, you would have to put a loop around it.
This is not a particularly easy problem to solve. When you mix binary and text data in the same stream, it is difficult to read it back. In Java, there is a class called DataInputStream that can help a little - it has a readLine
method to read a line of text, and also methods to read binary data:
DataInputStream dataInput = new DataInputStream(inputStream);
for (int j = 1; j <= numberOfChunks; j++) {
String line = dataInput.readLine();
...
byte[] chunk = new byte[toRead];
int read = dataInput.readFully(chunk);
...
}
DataInputStream has limitations: the readLine
method is deprecated because it assumes the text is encoded in latin-1, and does not let you use a different text encoding. If you want to go further down this road you'll want to create a class of your own to read your stream format.
> Some images are quite big (up to 10MiB sometimes), so I split the image bytes and send it in chunks of 32768 bytes each.
You know this is totally unnecessary right? There is absolutely no problem sending multiple megabytes of data into a TCP socket, and streaming all of the data in on the receiving side.
答案3
得分: 0
"When you try to send an image, you have to open the image as a normal file, then substring the image into some chunks, and every chunk changes it into 'base64encode' when you send it, and the client decodes it because the image data is not normal data. So 'base64encode' changes these symbols to normal characters like 'AfHM65Hkgf7MM'."
英文:
When you try to send image, you have to open the image as a normal file then substring the image into some chunks and every chunk change it into "base64encode" when you send and the client decode it because the image data is not normal data, so base64encode change this symbols to normal chars like AfHM65Hkgf7MM
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论