英文:
Efficiently sharing data between processes in different languages
问题
# Context
我正在编写一个Java程序,通过标准输入和标准输出与一个C#程序进行通信。C#程序作为子进程启动。它通过标准输入获取“请求”,并通过标准输出发送“响应”。这些请求非常轻量(几个字节大小),但响应很大。在程序的正常运行中,响应的数据量约为2GB。
我正在寻找提高性能的方法,我的测量结果表明向标准输出写入是一个瓶颈。以下是正常运行的数据:
- 总时间:195秒
- 通过标准输出传输的数据:2026MB
- 花费在向标准输出写入的时间:85秒
- 标准输出吞吐量:23.8 MB/s
顺便说一下,我首先将所有字节写入内存缓冲区,然后一次性将它们复制到标准输出,以确保我只测量标准输出的写入时间。
# Question
有什么高效而优雅的方法可以在C#子进程和Java父进程之间共享数据?很明显,标准输出是不够的。
我在这里和那里读到有关通过内存映射文件共享内存的方法,但Java和.NET的API让我觉得我可能在找错地方。
英文:
Context
I am writing a Java program that communicates with a C# program through standard in and standard out. The C# program is started as a child process. It gets "requests" through stdin and sends "responses" through stdout. The requests are very lightweight (a few bytes size), but the responses are large. In a normal run of the program, the responses amount for about 2GB of data.
I am looking for ways to improve performance, and my measurements indicate that writing to stdout is a bottleneck. Here are the numbers from a normal run:
- Total time: 195 seconds
- Data transferred through stdout: 2026MB
- Time spent writing to stdout: 85 seconds
- stdout throughput: 23.8 MB/s
By the way, I am writing all the bytes to an in-memory buffer first, and copying them in one go to stdout to make sure I only measure stdout write time.
Question
What is an efficient and elegant way to share data between the C# child process and the Java parent process? It is clear that stdout is not going to be enough.
I have read here and there about sharing memory through memory mapped files, but the Java and .NET APIs give me the impression that I'm looking in the wrong place.
答案1
得分: 1
在你进一步投资于内存映射文件或命名管道之前,我建议首先检查你是否实际上进行了高效的读写操作。java.lang.Process.getInputStream()
使用了 BufferedInputStream,因此读取端应该没问题。但是在你的 C# 程序中,你很可能会使用 Console.Write
。问题在于这里默认启用了 AutoFlush。因此,每次单独的写操作都会显式地刷新流。我上次编写 C# 代码是几年前的事情,所以我不是最新的了解。但也许可以将 Console.Out 的 AutoFlush 属性设置为 false,在多次写操作之后手动刷新流。
如果不能禁用 AutoFlush,提高 Console.Out 性能的唯一方法就是一次性写入更多文本。
另一个潜在的瓶颈可能是中间的 shell 需要解释所写的数据。确保直接执行 C# 程序,而不是通过脚本或调用命令执行器来运行它。
在开始使用内存映射文件之前,我建议首先尝试简单地写入文件。只要你拥有足够的未被你的程序或其他程序使用的空闲内存,只要没有其他频繁访问磁盘的程序,操作系统就能够在文件系统缓存中保存相当大量的写入数据。只要你的 Java 程序从文件中读取速度足够快,同时你的 C# 程序正在向文件写入数据,那么只有一些或甚至没有数据需要从磁盘加载。
英文:
Before you invest more in memory mapped files or named pipes I would first check whether you actually read and write efficiently. <code>java.lang.Process.getInputStream()</code> uses a BufferedInputStream, so the reader side should be OK. But in your C# program you will most likely use <code>Console.Write</code>. The problem here is that AutoFlush is enabled by default. So every single write explicitely flushes the stream. I wrote my last C# code years ago, so I'm not up-to-date. But maybe it is possible to set the AutoFlush property of Console.Out to false and flush the stream manually after multiple writes.
If disabling AutoFlush should not be possible the only way to improve performance with Console.Out would be to write more text with a single write.
Another potential bottleneck may be a shell in between that has to interpret the written data. Ensure that you execute the C# program directly and not through a script or by calling the command executor.
Before you start using memory mapped files I would first try to simply write into a file. As long as you have enough free memory that is not used by your programs or others and as long as there are no other programs with frequent disk access the operating system will be able to hold quite a big amount of written data within the file system cache. As long as your Java program reads fast enough from file while your C# program is writing to the file chances are high that only some or even no data has to be loaded from disk.
答案2
得分: 0
正如Matthew Watson在评论中提到的,使用内存映射文件确实是可能的,并且速度非常快。事实上,我的程序吞吐量从24 MB/s提升到了180 MB/s。以下是主要思路。
以下是创建用于通信的内存映射文件并打开可读取缓冲区的Java代码示例:
var path = Paths.get("test.mmap");
var channel = FileChannel.open(path, StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE);
var mappedByteBuffer = channel.map(FileChannel.MapMode.READ_WRITE, 0, 200_000 * 8);
以下是打开内存映射文件并创建可用于写入字节的流的C#代码示例(注意buffer
是要写入的字节数组的名称):
// 这段代码假设文件已经在Java端创建好
var file = File.Open("test.mmap", FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite);
var memoryMappedFile = MemoryMappedFile.CreateFromFile(file, fileName, 0, MemoryMappedFileAccess.ReadWrite, HandleInheritability.None, false);
var stream = memoryMappedFile.CreateViewStream();
stream.Write(buffer, 0, buffer.Length);
stream.Flush();
当然,您需要以某种方式同步Java和C#端。出于简单起见,上述代码中并没有包含同步部分。在我的代码中,我使用标准输入和标准输出来标识何时可以安全地进行读取和写入。
英文:
As Matthew Watson mentioned in the comments, it is indeed possible and incredibly fast to use a memory mapped file. In fact, the throughput for my program went from 24 MB/s to 180 MB/s. Below is the gist of it.
The following Java code creates the memory mapped file used for communication and opens a buffer we can read from:
<!-- language: java -->
var path = Paths.get("test.mmap");
var channel = FileChannel.open(path, StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE);
var mappedByteBuffer = channel.map(FileChannel.MapMode.READ_WRITE, 0, 200_000 * 8);
The following C# code opens the memory mapped file and creates a stream that you can use to write bytes to it (note that buffer
is the name of the array of bytes to be written):
<!-- language: c# -->
// This code assumes the file has already been created on the Java side
var file = File.Open("test.mmap", FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite);
var memoryMappedFile = MemoryMappedFile.CreateFromFile(file, fileName, 0, MemoryMappedFileAccess.ReadWrite, HandleInheritability.None, false);
var stream = memoryMappedFile.CreateViewStream();
stream.Write(buffer, 0, buffer.Length);
stream.Flush();
Of course, you need to somehow synchronize the Java and the C# side. For the sake of simplicity, I didn't include that in the code above. In my code, I am using standard in and standard out to signal when it is safe to read / write.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论