英文:
Fastest way to compile small Java files
问题
我正在寻找以编程方式编译(而不是通过IDE)一些小的类,例如:
public class Sum {
public int sum(int[] nums) {
int total = 0;
for (int i = 0; i < nums.length; i++) {
total += nums[i];
}
return total;
}
}
在 macOS 上运行 javac Sum.java
需要 429ms
(2.3 GHz 英特尔 Core i5),但在一个具有 3VCPU 和 6G RAM 的 Kubernetes 容器(t3.xlarge)中,需要 640ms
。是什么原因导致了这种差异?
我尝试了不同的 Java 版本,并尝试使用 javax.tools.JavaCompiler
,但是对于像这样的几个文件,编译这 3 个小文件最多需要 2 秒钟。
对于使这些小文件的编译最快的硬件/软件配置是什么?
英文:
I'm looking to programmatically compile (not through IDE) a few small classes such as:
public class Sum {
public int sum(int[] nums) {
int total = 0;
for (int i = 0; i < nums.length; i++) {
total += nums[i];
}
return total;
}
}
Running javac Sum.java
takes 429ms
on macOS (2.3 GHz Intel Core i5) but in a Kubernetes container (t3.xlarge) with 3VCPU, and 6G of RAM, it takes 640ms
. What causes this difference?
I tried different java versions and tried using javax.tools.JavaCompiler
but with few files, it takes up to 2 seconds to compile 3 small files like these.
What's the best configuration of hardware/software to make a compilation of these small files fastest?
答案1
得分: 3
刚刚我实测得到的数字是每个文件 5 毫秒...
如果你有很多 Java 文件要编译,那或许你会关心这个。但如果是这种情况,你所谈论的计时数字并不现实。我编写了这个程序来编译你的测试程序 1000 次(我创建了 1000 个定义了 Sum0 到 Sum999 类的 java 文件)。
import javax.tools.JavaCompiler;
import javax.tools.ToolProvider;
public class T {
public static void main(String[] args) {
JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
int N = 1000;
String[] files = new String[N];
for (int i = 0 ; i < N ; i++) {
files[i] = String.format("stage/Sum%d.java", i);
}
int result = compiler.run(null, null, null, files);
System.out.println("编译结果代码 = " + result);
}
}
我运行了它,这是那次运行的计时数据:
> ls stage/*.class | wc -l
ls: stage/*.class: 没有那个文件或目录
0
> time java T
编译结果代码 = 0
real 0m2.511s
user 0m4.737s
sys 0m0.549s
> ls stage/*.class | wc -l
1000
因此总运行时间为 4.737 秒 / 1000 = 0.005 秒 = 每个文件 5 毫秒。我能想到的就是你可能在逐个编译文件,这种情况下所有的时间都是一些启动/关闭成本,而谁会关心那些呢。
底线是,你可能可以在几秒钟内编译任意数量的 Java 文件,所以别再为此担心了。
这是在几代前的 MacBook Pro 上运行的。
英文:
The number I just got empirically is 5ms/file...
If you have a lot of Java files to compile, then maybe you care about this. But if that's the case, then the timing numbers you're talking about are not realistic. I wrote this program to compile your test program 1000 times (I created 1000 java files defining classes Sum0 through Sum999.
import javax.tools.JavaCompiler;
import javax.tools.ToolProvider;
public class T {
public static void main(String[] args) {
JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
int N = 1000;
String[] files = new String[N];
for (int i = 0 ; i < N ; i++) {
files[i] = String.format("stage/Sum%d.java", i);
}
int result = compiler.run(null, null, null, files);
System.out.println("Compile result code = " + result);
}
}
I ran it, and here are the timing figures for that run:
> ls stage/*.class | wc -l
ls: stage/*.class: No such file or directory
0
> time java T
Compile result code = 0
real 0m2.511s
user 0m4.737s
sys 0m0.549s
> ls stage/*.class | wc -l
1000
So the total runtime was 4.737 seconds / 1000 = .005 seconds = 5 ms /file. All I can think of is that you're compiling files one at a time, in which case all the time is some kind of startup/teardown cost, and who cares about that.
The bottom line is that you can probably compile however many Java files you want in just a few seconds, so stop worrying about this.
This was run on a couple of generations back MacBook Pro.
答案2
得分: 2
很久很久以前,我们有过"增量编译器"。这些编译器会持续运行。你在一天开始时启动它们,然后就让它们一直保持运行状态。它们要么等待键盘输入(按下回车键),要么设置一些文件监视器(在给定目录中的任何文件被修改时触发的钩子),并在每次需要时重新编译。这样可以节省虚拟机的预热和启动时间。
我认为随着构建工具(特别是Maven和Gradle)的流行,它们大多数已经退出了舞台。这个模型与构建系统的结合不太好;构建系统本身应该是增量的单元(只运行一次并保持加载状态),因此也必须内部化编译器,以避免每次需要编译时都必须初始化和预热虚拟机。
考虑到你只想编译一个简单的小文件,虚拟机的初始化和编译器的预热很可能占据了那640毫秒的99%。增量编译器正是你需要的。
增量编译器仍然存在,但我所知道的现代化的只有Eclipse:如果你在Eclipse中保存一个文件,Eclipse将会立即使用其内置的编译器进行编译。那显然不是你想要的。
如果仍然存在并且得到了良好的维护,你可以在网上搜索增量编译的相关信息。你也可以尝试手动实现:创建一个Java应用程序,监听系统输入中的完整行(例如,使用Scanner和.useDelimiter("\r?\n")
),将这些行视为文件或目录,扫描这些文件以查找任何新的或已更改的文件,在所有更改的文件上调用javax.tools.JavaCompiler
,然后更新一个将文件映射到时间戳的哈希映射,然后继续等待更多的System.in流量,使用该映射来决定缓存。或者放弃使用映射,采用旧的方式:找到类文件,将其时间戳与源文件的时间戳进行比较,如果类文件的时间戳较新,则无需重新编译。这并不完全符合Java的工作方式(一个源文件可以产生多个类文件,无法从类文件名中确定是哪个源文件生成的,尽管如果添加足够的调试信息并打开它,可能可以实现,但类文件是一个相当复杂的格式,所以这样做并不简单)- 所以我会使用哈希映射,即使过去的编译器使用了“比较类文件的最后修改时间与源文件”的选项。
或者,也可以打开一个TCP/IP端口。
注意,Eclipse还以“语言服务器”的形式存在,这正是VSCode的工作原理:当你在VSCode中编辑Java代码时,VS会启动一个无界面的Eclipse,并使用它来执行关于Java的所有“智能”操作:重构脚本、实时编译、错误、警告和通用的代码检查服务、导航服务(例如“打开类型”、“查找调用者”)、调试器等。编译是Eclipse作为语言服务器提供的众多服务之一,而ECJ的启动速度比Javac快4到10倍(虚拟机初始化和预热后的实际编译速度可能是640毫秒的一小部分,因此不相关,但我不知道你需要多快)。
英文:
Wahaaaaaay back in the distant, distant past, we had 'incremental compilers'. These would run continuously. You start them up once at the start of the day or whatnot and then just leave em running forever. They would then either wait for any keyboard input (an enter press), or set up some file watches (hooks that trigger when any file is modified in a given directory) and recompile every time that was needed. This saves VM warmup and VM boot loading.
I think they mostly died out as the build tools (particularly maven and gradle) became popular. The model doesn't mesh well with build systems; the build system itself should be the incremental unit (should run once and stay loaded), and must therefore also internalize the compiler in order to avoid having to init and warmup a VM every time it needs to compile.
Given that you want to compile one simple single small file, the VM init and compiler warmup is most likely 99% of that 640ms. An incremental compiler would be precisely what you need.
Incremental compilers still exist, but the only modern one I know of, is eclipse: If you save a file in eclipse, eclipse will almost instantaneously compile it using its own built in compiler. That's.. clearly not what you're looking for.
You can search the web if something incremental still exists and is decently maintained. You should also be able to handroll this, perhaps: Have a java app that will do something like: Listen for any full lines on system input (e.g. with Scanner and .useDelimiter("\r?\n"); .next()
, treats those lines as files or dirs, will scan those for any new or changed files, invoke javax.tools.JavaCompiler
on all changed things, and then update a hashmap mapping files to timestamps, and then go back to sleep waiting for more System.in traffic, using that map to decide on caches. Or forget the map and go old style: Find the class file, compare its timestamp to the source file's timestamp, and if the class file's stamp is newer, there is no need to recompile it. This doesn't quite map perfectly onto how java works (1 source file can produce multiple class files and it is not possible to determine from a class file name which source file produced it, though you probably can if you add enough debug info and open it up, but class files are quite a complex format so doing that is non-trivial) - so I'd do the hashmap, even if the compilers of yore went with the 'compare class file last-modified with source file' option.
Alternatively it could open up a TCP/IP port.
Note that eclipse also comes in 'language server' form which is exactly how VSCode works: When you edit java code in VSCode, VS starts a headless eclipse and uses it to do, well, everything 'intelligent' about java: Refactor scripts, live compilation, errors, warnings and general linting services, nav services (such as 'open type', 'find callers'), the debugger, and more. Compiling is part of the bevy of services that eclipse-as-a-language-server offers, and ecj is 4 to 10 times faster than javac to boot (speed of actual compilation after VM init and warmup is probably a fraction of a percent of that 640ms, so therefore not relevant, but I don't know exactly how fast you need this to be).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论