英文:
does python detect code snippets that are useless? (dead code elimination)
问题
如果我编写类似以下的代码:
for _ in range(100_000_000):
a = 10
b = 20
c = a + b
d = c * 2
e = d / 2
del a, b, c, d, e
print("Hello World")
Python编译器会意识到这是无用的,并且不需要执行任何操作吗?我听说gcc可以理解这一点,但Python不行。
我的测试结果也证实了这一点,但我正在寻找确认,因为我偶然看到了这样的帖子(https://bugs.python.org/issue1346214),这让我想知道它是否实际实现了。这是一个非常旧的Python版本,我使用的是3.11.1,但如果他们已经在讨论这个问题,我相信如果他们要实现它,现在应该已经实现了吧?
在我的笔记本电脑上,这段Python代码运行需要0.07秒,而仅运行Hello World则需要0秒,这就是我认为没有死代码消除的原因。
英文:
If I code something like this:
for _ in range(100_000_000):
a = 10
b = 20
c = a + b
d = c * 2
e = d / 2
del a, b, c, d, e
print("Hello World")
Will Python compiler realize it is useless and that there is no need to do anything? I've heard gcc can understand that but not Python.
My tests confirm that but I'm looking for a confirmation since I've stumbled across posts like this (https://bugs.python.org/issue1346214) that make me wonder if it's actually implemented. It's a very old version of Python and I'm using 3.11.1 but if they were already talking about it, I'm sure if they were to implement it it's now implemented?
On my laptop, the python code takes 0.07 seconds to run against 0 seconds for just the hello world which is why I think there is no dead code elimination.
答案1
得分: 2
It does not eliminate the useless code, and you can prove it to yourself - compare the disassembled code in slow
vs that in fast
and you'll see it does indeed do all those useless computations. You can also verify they're being run using timeit
. This only applies to cpython
though - other implementations of python may optimize this (as you can see below with numba
's python subset!)
import dis
import timeit
def slow():
for _ in range(100_000_000):
a = 10
b = 20
c = a + b
d = c * 2
e = d / 2
del a, b, c, d, e
print("Hello World")
def fast():
print("Hello World")
print(timeit.timeit('slow()', number=1000, globals=globals()))
dis.dis(slow)
print("*****")
print(timeit.timeit('fast()', number=5, globals=globals()))
dis.dis(fast)
Yields:
39.3209762
3 0 LOAD_GLOBAL 0 (range)
2 LOAD_CONST 1 (100000000)
4 CALL_FUNCTION 1
6 GET_ITER
>> 8 FOR_ITER 46 (to 56)
10 STORE_FAST 0 (_)
4 12 LOAD_CONST 2 (10)
14 STORE_FAST 1 (a)
5 16 LOAD_CONST 3 (20)
18 STORE_FAST 2 (b)
6 20 LOAD_FAST 1 (a)
22 LOAD_FAST 2 (b)
24 BINARY_ADD
26 STORE_FAST 3 (c)
7 28 LOAD_FAST 3 (c)
30 LOAD_CONST 4 (2)
32 BINARY_MULTIPLY
34 STORE_FAST 4 (d)
8 36 LOAD_FAST 4 (d)
38 LOAD_CONST 4 (2)
40 BINARY_TRUE_DIVIDE
42 STORE_FAST 5 (e)
9 44 DELETE_FAST 1 (a)
46 DELETE_FAST 2 (b)
48 DELETE_FAST 3 (c)
50 DELETE_FAST 4 (d)
52 DELETE_FAST 5 (e)
54 JUMP_ABSOLUTE 8
10 >> 56 LOAD_GLOBAL 1 (print)
58 LOAD_CONST 5 ('Hello World')
60 CALL_FUNCTION 1
62 POP_TOP
64 LOAD_CONST 0 (None)
66 RETURN_VALUE
*****
8.899999997424857e-06
13 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 ('Hello World')
4 CALL_FUNCTION 1
6 POP_TOP
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
If you do want really ripping fast python with optimization, look at numba
:
import numba as nb
import timeit
@nb.jit
def slow():
for _ in range(100_000_000):
a = 10
b = 20
c = a + b
d = c * 2
e = d / 2
print("Hello World")
@nb.jit
def fast():
print("Hello World")
print(timeit.timeit('slow()', number=5, globals=globals()))
print(timeit.timeit('fast()', number=5, globals=globals()))
print(slow.inspect_llvm()[tuple()])
print("****")
print(fast.inspect_llvm()[tuple()])
These will yield very similar LLVM code. There are a bunch of limitations that come with numba
compiled code, though, which are out of scope for this question.
英文:
It does not eliminate the useless code, and you can prove it to yourself - compare the disassembled code in slow
vs that in fast
and you'll see it does indeed do all those useless computations. You can also verify they're being run using timeit
. This only applies to cpython
though - other implementations of python may optimize this (as you can see below with numba
's python subset!)
import dis
import timeit
def slow():
for _ in range(100_000_000):
a = 10
b = 20
c = a + b
d = c * 2
e = d / 2
del a, b, c, d, e
print("Hello World")
def fast():
print("Hello World")
print(timeit.timeit('slow()',number=1000,globals=globals()))
dis.dis(slow)
print("*****")
print(timeit.timeit('fast()',number=5,globals=globals()))
dis.dis(fast)
Yields:
39.3209762
3 0 LOAD_GLOBAL 0 (range)
2 LOAD_CONST 1 (100000000)
4 CALL_FUNCTION 1
6 GET_ITER
>> 8 FOR_ITER 46 (to 56)
10 STORE_FAST 0 (_)
4 12 LOAD_CONST 2 (10)
14 STORE_FAST 1 (a)
5 16 LOAD_CONST 3 (20)
18 STORE_FAST 2 (b)
6 20 LOAD_FAST 1 (a)
22 LOAD_FAST 2 (b)
24 BINARY_ADD
26 STORE_FAST 3 (c)
7 28 LOAD_FAST 3 (c)
30 LOAD_CONST 4 (2)
32 BINARY_MULTIPLY
34 STORE_FAST 4 (d)
8 36 LOAD_FAST 4 (d)
38 LOAD_CONST 4 (2)
40 BINARY_TRUE_DIVIDE
42 STORE_FAST 5 (e)
9 44 DELETE_FAST 1 (a)
46 DELETE_FAST 2 (b)
48 DELETE_FAST 3 (c)
50 DELETE_FAST 4 (d)
52 DELETE_FAST 5 (e)
54 JUMP_ABSOLUTE 8
10 >> 56 LOAD_GLOBAL 1 (print)
58 LOAD_CONST 5 ('Hello World')
60 CALL_FUNCTION 1
62 POP_TOP
64 LOAD_CONST 0 (None)
66 RETURN_VALUE
*****
8.899999997424857e-06
13 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 ('Hello World')
4 CALL_FUNCTION 1
6 POP_TOP
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
If you do want really ripping fast python with optimization, look at numba
:
import numba as nb
import timeit
@nb.jit
def slow():
for _ in range(100_000_000):
a = 10
b = 20
c = a + b
d = c * 2
e = d / 2
print("Hello World")
@nb.jit
def fast():
print("Hello World")
print(timeit.timeit('slow()',number=5,globals=globals()))
print(timeit.timeit('fast()',number=5,globals=globals()))
print(slow.inspect_llvm()[tuple()])
print("****")
print(fast.inspect_llvm()[tuple()])
These will yield very similar LLVM code. There are a bunch of limitations that come with numba
compiled code, though, which are out of scope for this question.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论