Compilation to Bytecode, Java vs Python. What is the reason for the difference in time taken?

huangapple go评论105阅读模式
英文:

Compilation to Bytecode, Java vs Python. What is the reason for the difference in time taken?

问题

  1. 为什么与Python相比,Java编译为Java字节码需要更多的时间?在Java中,编译是一个显式的步骤,而在Python中,它在运行时发生。

  2. 为什么第一次运行和后续运行之间没有明显的差异,即使在第一次运行时编译为CPython字节码并缓存在.py文件中,后续运行都会使用它。这个字节码编译在Python中真的是一个几乎没有成本的任务吗?

虽然它在运行时起着重要作用,但我认为静态与动态类型在编译过程中不应该起太大的作用,也不应该是这种时间差异的唯一原因。此外,我认为在这两种实现中,在字节码生成过程中都进行了一些优化。

我是否漏掉了什么重要的东西?(我在Java方面没有太多的经验。)

更新:

实际上,我对Python的首次运行和后续运行进行了时间分析,发现第二个问题是错误的。在运行大型Python文件时确实存在明显的差异。

我的方法很简单。创建了一个包含重复行的大型文件,例如:

a = 5
b = 6
c = a * b
print(str(c))

然后将其导入到文件 large.py 并运行 time python large.py

第一次运行结果:

python large.py  1.49s user 0.33s system 97% cpu 1.868 total

第二次运行结果:

python large.py  0.20s user 0.08s system 90% cpu 0.312 total

删除 __pycache__ 文件夹后:

python large.py  1.57s user 0.34s system 97% cpu 1.959 total

因此,基本上在Python中,编译为字节码也是一个昂贵的过程,只是它的成本没有在Java中那么高。

英文:

Both Java and python (talking about CPython only) are interpreted to Java and CPython bytecode respectively. Both bytecodes are then interpreted by their respective virtual Machines (JVM & Cpython VM). (Here I am ignoring the JIT compilation part which kicks in after 10K runs.)

I have 2 questions regarding this:

  1. Why does Java compilation to java bytecode take so much time as compared to python? In java, compilation is an explicit step while in python it happens at runtime.
  2. Why is there no noticeable difference between the first run and the nth run of python when in the first run compilation to CPython bytecode is done and cached in .pyc files which is used in all successive runs. Is this bytecode compilation really an almost zero cost task in python?

Although it plays a big role in the runtime, I suppose static vs dynamic typing shouldn't play too big a role during the compilation and should not be the only reason for this difference in timings. Also, I think in both the implementations, some optimisation is done during the bytecode generation.

Is there something that I am missing here? (I do not have much experience working in Java.)

Update:

I actually did time profiling for python first run and later runs and found that statement 2 is wrong. There is a very noticeable difference when running a large python file.

Approach was simple. Created a large file with repeated lines of

a = 5
b = 6
c = a*b
print(str(c))

Then imported it to file large.py and ran time python large.py

First run result:

python large.py  1.49s user 0.33s system 97% cpu 1.868 total

Second run result:

python large.py  0.20s user 0.08s system 90% cpu 0.312 total

After deleting the __pycache__ folder:

python large.py  1.57s user 0.34s system 97% cpu 1.959 total

So basically in python also, the compilation to bytecode is a costly process, just that it's not as costly as in java.

答案1

得分: 8

以下是您要翻译的内容:

Java字节码编译器需要执行比Python字节码编译器更多的检查。举个例子,看看来自"Hello World"程序的这一行:

System.out.println("Hello World!");

要编译这一行代码,编译器必须找出所有部分的含义。这比听起来要复杂得多:System可以是一个包(package)。或者它可以是一个类(class),无论是在代码所在的同一个包中,还是在导入的包中的一个,或者在java.lang中。因此,编译器必须按照这个顺序检查所有这些选项。一旦找到了System类,就必须检查其访问修饰符是否允许此用法。

之后,编译器必须弄清楚out是什么:它是一个嵌套类,还是一个类成员,以及它的访问修饰符是什么?编译器发现它是PrintStream类型的静态成员变量。然后,对于println,必须进行相同的检查。在知道所有这些之前,编译器不能为这一行代码生成任何代码,因为生成的字节码取决于涉及的对象类型。

所有这些检查需要时间,最重要的是因为编译器必须加载标准库中的大量类定义,即使对于最简单的程序也是如此。

相比之下,Python字节码编译器只需要解析这一行,就可以立即生成代码,无需查看额外的模块。在Python中,代码将被编译为:

  • 从当前作用域查找"System"对象(LOAD_NAME)
  • 从System中查找"out"属性(LOAD_ATTR)
  • 从"out"中查找"println"(LOAD_METHOD)
  • 生成调用它的代码(CALL_METHOD)

Python编译器不关心这些查找在运行时是否失败。

另一个重要的区别是,Java编译器完全是用Java编写的,并在运行时编译成机器代码,而CPython实现的大部分是提前编译的C代码。这意味着与Python相比,Java存在一些"冷启动"问题。

**更新:**自Java 9以来,您可以直接从源代码运行Java程序,而无需将其编译为字节码。运行一个简单的"Hello World"程序可以让您了解即使对于一个简单的程序,将Java编译为字节码也可以节省多少时间:

  • 使用time python hello.py测量,Python程序在45-50毫秒内运行。
  • 不经过字节码编译直接运行的Java程序,使用time java Hello.java测量,运行时间为350-400毫秒。
  • 经过字节码编译后运行的Java程序,使用time java Hello测量,运行时间为70-80毫秒。

免责声明:未遵循科学方法或进行统计分析,因此请谨慎对待。测试环境:Python版本3.8.5,Java版本11.0.8,在Fedora 32上,使用Intel i7 8750H CPU。

hello.py:

print("hello world")

Hello.java:

public class Hello {
    public static void main(String[] args) {
        System.out.println("Hello world");
    }
}
英文:

The Java byte code compiler has to do a lot more checks than the Python byte code compiler. To illustrate, take this line from the "hello world" program:

System.out.println("Hello World!");

To compile this single line of code, the compiler has to find what all of its parts mean. This is more complicated than it sounds: System could be a package. Or it could be a class, either in the same package where the code is, or in one of the imported packages, or in java.lang. So the compiler has to check all of those options, in that order. Once it finds the System class, it has to check if its access modifiers permit this use.

After that, the compiler has to figure out what out is: is it a nested class, or a class member, and what are its access modifiers? The compiler finds that it's a static member variable, of the PrintStream type. Then it has to do the same checks for println. The compiler cannot emit any code for this line of code until it knows all of this because the generated byte code is different based on the types of the objects involved.

All these checks take time, most importantly because the compiler has to load a ton of class definitions from the standard library even for the most trivial program.

In comparison, the Python byte code compiler only needs to parse the line, and it can immediately generate code without looking at extra modules. In Python the code would be compiled to:

  • looking up a "System" object from the current scope (LOAD_NAME)
  • looking up an "out" attribute from System (LOAD_ATTR)
  • looking up "println" from "out" (LOAD_METHOD)
  • generate code to call it (CALL_METHOD)

The Python compiler doesn't care if some of these lookups failed at run time.

Another important difference is that the Java compiler is written entirely in Java, and compiled to machine code at run time, while much of CPython implementation is ahead-of-time compiled C code. This means Java has a bit of "cold start" problem compared to Python.

Update: Since Java 9, you can run a java program directly from source, without compiling it to byte code. Running a trivial "hello world" program gives you an idea of how much you save by compiling Java to byte code ahead of time, even for a trivial program:

  • The python program runs in 45-50 milliseconds as measured with time python hello.py.
  • The Java program without compiling to byte code ahead of time runs in 350-400 milliseconds as measured with time java Hello.java
  • The Java program after compiling to byte code runs in 70-80 milliseconds, as measured with time java Hello

Disclaimer: No scientific method followed or statistical analysis performed, so take this with a grain of salt. Test environment: Python version 3.8.5, Java version 11.0.8, on Fedora 32, with Intel i7 8750H CPU

hello.py:

print("hello world")

Hello.java:

public class Hello {
    public static void main(String[] args) {
        System.out.println("Hello world");
    }
}

huangapple
  • 本文由 发表于 2020年8月14日 03:25:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/63401983.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定