2023年7月10日 20:56:52go评论155阅读模式

英文:

Repeat elements in nested lists each a different number of times, why smarter methods are slower?

问题

I had seen this question today, and clearly the asker didn't show any research effort at all. But someone posted an answer, the code in the answer was very straightforward and verbose, so I wanted to post a more concise and elegant solution, and I wanted the smarter method to be faster.

To save you a click, the problem is, given a list of lists, and another list of the same number of lists as the first one, each sublist in the second nested list contains only integers, and all sublists of the second list contain the same number of elements as the first list, assume they are different, repeat each last-level element in the first nested list the corresponding element in the second list times.

Example:

data = ([2, 0, 2, 2],
 [3, 3, 1, 2],
 [1, 0, 3, 3],
 [1, 1, 1, 2],
 [0, 0, 2, 1],
 [0, 1, 3, 3],
 [3, 1, 3, 2],
 [1, 0, 1, 2])

mult = ([3, 0, 0, 3],
 [2, 2, 1, 1],
 [0, 2, 2, 1],
 [3, 3, 3, 2],
 [0, 2, 3, 2],
 [1, 1, 3, 2],
 [3, 1, 2, 3],
 [3, 2, 0, 0])

output = deque([[2, 2, 2, 2, 2, 2],
       [3, 3, 3, 3, 1, 2],
       [0, 0, 3, 3, 3],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2],
       [0, 0, 2, 2, 2, 1, 1],
       [0, 1, 3, 3, 3, 3, 3],
       [3, 3, 3, 1, 3, 3, 2, 2, 2],
       [1, 1, 1, 0, 0]])

I quickly came up with a list comprehension solution:

def repeat_element_listcomp(data, mult):
    return [[i for i, j in zip(a, b) for _ in range(j)] for a, b in zip(data, mult)]

But I was surprised to find it slower than the simple solution of the first answer:

def repeat_element_zero(data, mult):
    combined = []
    for sublist1, sublist2 in zip(data, mult):
        sublist = []
        for elem1, elem2 in zip(sublist1, sublist2):
            sublist.extend([elem1]* elem2)

        combined.append(sublist)

    return combined

In [229]: %timeit repeat_element_zero(data, mult)
9.86 &#181;s &#177; 129 ns per loop (mean &#177; std. dev. of 7 runs, 100,000 loops each)

In [230]: %timeit repeat_element(data, mult)
14.1 &#181;s &#177; 156 ns per loop (mean &#177; std. dev. of 7 runs, 100,000 loops each)

I wasted many minutes trying to come up with a more efficient solution, I tried many more smart methods, and all of them are slower somehow. I then posted an answer there.

Setup:

import random
from collections import deque
from functools import reduce
from itertools import chain
from operator import iconcat

def random_numbers(n):
    return random.choices(range(n), k=n)

def make_data(n):
    return random_numbers(n), random_numbers(n)

def make_sample(n, limit=300):
    return list(zip(*[make_data(limit) for _ in range(n)]))

def repeat_element_zero(data, mult):
    combined = []
    for sublist1, sublist2 in zip(data, mult):
        sublist = []
        for elem1, elem2 in zip(sublist1, sublist2):
            sublist.extend([elem1]* elem2)

        combined.append(sublist)

    return combined

def repeat_element_listcomp(data, mult):
    return [[i for i, j in zip(a, b) for _ in range(j)] for a, b in zip(data, mult)]

def repeat_element_chain(data, mult):
    return [list(chain(*([i]*j for i, j in zip(a, b)))) for a, b in zip(data, mult)]

def repeat_element_helper(data, mult):
    return reduce(iconcat, ([i]*j for i, j in zip(data, mult)), [])

def repeat_element(data, mult):
    return deque(map(repeat_element_helper, data, mult))

approaches=[
    repeat_element_listcomp,
    repeat_element_chain,
    repeat_element

]
run_performance_comparison(approaches,[1000,2000,3000],setup=make_sample)

Performance:

[![enter image description here][1]][1]

In [188]: data, mult = make_sample(32, 10)

In [189]: %timeit repeat_element_zero(data, mult)
102 &#181;s &#177; 3.36 &#181;s per loop (mean &#177; std. dev. of 7 runs, 10,000 loops each)

In [190]: %timeit repeat_element_listcomp(data, mult)
145 &#181;s &#177; 3.55 &#181;s per loop (mean &#177; std. dev. of 7 runs, 10,000 loops each)

In [191]: %timeit repeat_element_chain(data, mult)
141 &#181;s &#177; 4.74 &#181;s per loop (mean &#177; std. dev. of 7 runs, 10,000 loops each)

In [192]: %timeit repeat_element(data, mult)
127 &#181;s &#177; 1.4 &#181;s per loop (mean &#177; std. dev. of 7 runs, 10,000 loops each)

In [193]: data, mult = make_sample(32, 32)

In [194]: %timeit repeat_element(data, mult)
576 &#181;s &#177; 10.8 &#181;s per loop (mean &#177; std. dev. of 7 runs, 1,000 loops each)

In [195]: %timeit repeat_element_chain(data, mult)
647 &#181;s &#177; 16.1 &#181;s per loop (mean &#177; std. dev. of 7 runs, 1,000 loops each)

In [196]:

<details>
<summary>英文:</summary>

I had seen this [question](https://stackoverflow.com/questions/76651767/multiply-list-in-list-string-items-with-list-in-list-integers) today, and clearly the asker didn&#39;t show any research effort at all. But someone posted an answer, the code in the answer was very straightforward and verbose, so I wanted to post a more concise and elegant solution, and I wanted the smarter method to be faster.

To save you a click, the problem is, given a list of lists, and another list of the same number of lists as the first one, each sublist in the second nested list contains only integers, and all sublist of the second list contain the same number of elements as the first list, assume they are different, repeat the each last level element in the first nested the corresponding element in the second list times.

Example:

data = ([2, 0, 2, 2],
[3, 3, 1, 2],
[1, 0, 3, 3],
[1, 1, 1, 2],
[0, 0, 2, 1],
[0, 1, 3, 3],
[3, 1, 3, 2],
[1, 0, 1, 2])

mult = ([3, 0, 0, 3],
[2, 2, 1, 1],
[0, 2, 2, 1],
[3, 3, 3, 2],
[0, 2, 3, 2],
[1, 1, 3, 2],
[3, 1, 2, 3],
[3, 2, 0, 0])

output = deque([[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 1, 2],
[0, 0, 3, 3, 3],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2],
[0, 0, 2, 2, 2, 1, 1],
[0, 1, 3, 3, 3, 3, 3],
[3, 3, 3, 1, 3, 3, 2, 2, 2],
[1, 1, 1, 0, 0]])


I quickly came up with a list comprehension solution:

def repeat_element_listcomp(data, mult):
return [[i for i, j in zip(a, b) for _ in range(j)] for a, b in zip(data, mult)]


But I was surprised to find it slower than the simple solution of the [first answer](https://stackoverflow.com/a/76651822/16383578):

def repeat_element_zero(data, mult):
combined = []
for sublist1, sublist2 in zip(data, mult):
sublist = []
for elem1, elem2 in zip(sublist1, sublist2):
sublist.extend([elem1]* elem2)

    combined.append(sublist)

return combined

In [229]: %timeit repeat_element_zero(data, mult)
9.86 µs ± 129 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [230]: %timeit repeat_element(data, mult)
14.1 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


I wasted many minutes trying to come up with a more efficient solution, I tried many more smart methods, and all of them are slower somehow, I then posted an [answer](https://stackoverflow.com/a/76651985/16383578) there.

Setup:

import random
from collections import deque
from functools import reduce
from itertools import chain
from operator import iconcat

def random_numbers(n):
return random.choices(range(n), k=n)

def make_data(n):
return random_numbers(n), random_numbers(n)

def make_sample(n, limit=300):
return list(zip(*[make_data(limit) for _ in range(n)]))

def repeat_element_zero(data, mult):
combined = []
for sublist1, sublist2 in zip(data, mult):
sublist = []
for elem1, elem2 in zip(sublist1, sublist2):
sublist.extend([elem1]* elem2)

    combined.append(sublist)

return combined

def repeat_element_listcomp(data, mult):
return [[i for i, j in zip(a, b) for _ in range(j)] for a, b in zip(data, mult)]

def repeat_element_chain(data, mult):
return [list(chain(*([i]*j for i, j in zip(a, b)))) for a, b in zip(data, mult)]

def repeat_element_helper(data, mult):
return reduce(iconcat, ([i]*j for i, j in zip(data, mult)), [])

def repeat_element(data, mult):
return deque(map(repeat_element_helper, data, mult))

approaches=[
repeat_element_listcomp,
repeat_element_chain,
repeat_element

]
run_performance_comparison(approaches,[1000,2000,3000],setup=make_sample)


Performance:

[![enter image description here][1]][1]

In [188]: data, mult = make_sample(32, 10)

In [189]: %timeit repeat_element_zero(data, mult)
102 µs ± 3.36 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [190]: %timeit repeat_element_listcomp(data, mult)
145 µs ± 3.55 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [191]: %timeit repeat_element_chain(data, mult)
141 µs ± 4.74 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [192]: %timeit repeat_element(data, mult)
127 µs ± 1.4 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [193]: data, mult = make_sample(32, 32)

In [194]: %timeit repeat_element(data, mult)
576 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [195]: %timeit repeat_element_chain(data, mult)
647 µs ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [196]: %timeit repeat_element_listcomp(data, mult)
837 µs ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [197]: %timeit repeat_element_zero(data, mult)
465 µs ± 15.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [198]: data, mult = make_sample(256, 32)

In [199]: %timeit repeat_element_zero(data, mult)
3.69 ms ± 64.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [200]: %timeit repeat_element(data, mult)
4.47 ms ± 88.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [201]: %timeit repeat_element_listcomp(data, mult)
7.01 ms ± 688 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Profiling Code:

import timeit
import matplotlib.pyplot as plt
from typing import List, Dict, Callable

from contextlib import contextmanager

@contextmanager
def data_provider(data_size, setup=lambda N: N, teardown=lambda: None):
data = setup(data_size)
yield data
teardown()

def run_performance_comparison(approaches: List[Callable],
data_size: List[int],
setup=lambda N: N,
teardown=lambda: None,
number_of_repetitions=5, title='N'):
approach_times: Dict[Callable, List[float]] = {approach: [] for approach in approaches}

for N in data_size:
    with data_provider(N, setup, teardown) as data:
        for approach in approaches:
            approach_time = timeit.timeit(lambda: approach(*data), number=number_of_repetitions)
            approach_times[approach].append(approach_time)

for approach in approaches:
    plt.plot(data_size, approach_times[approach], label=approach.__name__)

plt.xlabel(title)
plt.ylabel(&#39;Execution Time (seconds)&#39;)
plt.title(&#39;Performance Comparison&#39;)
plt.legend()
plt.show()


I want to know, why all of my smart methods slower? What is going on here? Why these methods which are normally what increase performance, are making the code slower in this case? I guess this must be implementation detail of CPython, and if version is important I am using `Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]`, I don&#39;t know C yet so I don&#39;t know the low-level stuff. But I am really curious and want an explanation.


----------

Disassembly of the functions:

In [232]: import dis

In [233]: dis.dis(repeat_element_zero)
11 0 BUILD_LIST 0
2 STORE_FAST 2 (combined)

12 4 LOAD_GLOBAL 0 (zip)
6 LOAD_FAST 0 (data)
8 LOAD_FAST 1 (mult)
10 CALL_FUNCTION 2
12 GET_ITER
>> 14 FOR_ITER 29 (to 74)
16 UNPACK_SEQUENCE 2
18 STORE_FAST 3 (sublist1)
20 STORE_FAST 4 (sublist2)

13 22 BUILD_LIST 0
24 STORE_FAST 5 (sublist)

14 26 LOAD_GLOBAL 0 (zip)
28 LOAD_FAST 3 (sublist1)
30 LOAD_FAST 4 (sublist2)
32 CALL_FUNCTION 2
34 GET_ITER
>> 36 FOR_ITER 12 (to 62)
38 UNPACK_SEQUENCE 2
40 STORE_FAST 6 (elem1)
42 STORE_FAST 7 (elem2)

15 44 LOAD_FAST 5 (sublist)
46 LOAD_METHOD 1 (extend)
48 LOAD_FAST 6 (elem1)
50 BUILD_LIST 1
52 LOAD_FAST 7 (elem2)
54 BINARY_MULTIPLY
56 CALL_METHOD 1
58 POP_TOP
60 JUMP_ABSOLUTE 18 (to 36)

17 >> 62 LOAD_FAST 2 (combined)
64 LOAD_METHOD 2 (append)
66 LOAD_FAST 5 (sublist)
68 CALL_METHOD 1
70 POP_TOP
72 JUMP_ABSOLUTE 7 (to 14)

19 >> 74 LOAD_FAST 2 (combined)
76 RETURN_VALUE

In [234]: dis.dis(repeat_element)
31 0 LOAD_GLOBAL 0 (deque)
2 LOAD_GLOBAL 1 (map)
4 LOAD_GLOBAL 2 (repeat_element_helper)
6 LOAD_FAST 0 (data)
8 LOAD_FAST 1 (mult)
10 CALL_FUNCTION 3
12 CALL_FUNCTION 1
14 RETURN_VALUE

In [235]: dis.dis(repeat_element_helper)
28 0 LOAD_GLOBAL 0 (reduce)
2 LOAD_GLOBAL 1 (iconcat)
4 LOAD_CONST 1 (<code object <genexpr> at 0x000001C86CC20240, file "<ipython-input-225-c385a750c738>", line 28>)
6 LOAD_CONST 2 ('repeat_element_helper.<locals>.<genexpr>')
8 MAKE_FUNCTION 0
10 LOAD_GLOBAL 2 (zip)
12 LOAD_FAST 0 (data)
14 LOAD_FAST 1 (mult)
16 CALL_FUNCTION 2
18 GET_ITER
20 CALL_FUNCTION 1
22 BUILD_LIST 0
24 CALL_FUNCTION 3
26 RETURN_VALUE

Disassembly of <code object <genexpr> at 0x000001C86CC20240, file "<ipython-input-225-c385a750c738>", line 28>:
0 GEN_START 0

28 2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 10 (to 26)
6 UNPACK_SEQUENCE 2
8 STORE_FAST 1 (i)
10 STORE_FAST 2 (j)
12 LOAD_FAST 1 (i)
14 BUILD_LIST 1
16 LOAD_FAST 2 (j)
18 BINARY_MULTIPLY
20 YIELD_VALUE
22 POP_TOP
24 JUMP_ABSOLUTE 2 (to 4)
>> 26 LOAD_CONST 0 (None)
28 RETURN_VALUE

In [236]: dis.dis(repeat_element_listcomp)
22 0 LOAD_CONST 1 (<code object <listcomp> at 0x000001C86CE6B7E0, file "<ipython-input-225-c385a750c738>", line 22>)
2 LOAD_CONST 2 ('repeat_element_listcomp.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (zip)
8 LOAD_FAST 0 (data)
10 LOAD_FAST 1 (mult)
12 CALL_FUNCTION 2
14 GET_ITER
16 CALL_FUNCTION 1
18 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x000001C86CE6B7E0, file "<ipython-input-225-c385a750c738>", line 22>:
22 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 14 (to 34)
6 UNPACK_SEQUENCE 2
8 STORE_FAST 1 (a)
10 STORE_FAST 2 (b)
12 LOAD_CONST 0 (<code object <listcomp> at 0x000001C86489E550, file "<ipython-input-225-c385a750c738>", line 22>)
14 LOAD_CONST 1 ('repeat_element_listcomp.<locals>.<listcomp>.<listcomp>')
16 MAKE_FUNCTION 0
18 LOAD_GLOBAL 0 (zip)
20 LOAD_FAST 1 (a)
22 LOAD_FAST 2 (b)
24 CALL_FUNCTION 2
26 GET_ITER
28 CALL_FUNCTION 1
30 LIST_APPEND 2
32 JUMP_ABSOLUTE 2 (to 4)
>> 34 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x000001C86489E550, file "<ipython-input-225-c385a750c738>", line 22>:
22 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 13 (to 32)
6 UNPACK_SEQUENCE 2
8 STORE_FAST 1 (i)
10 STORE_FAST 2 (j)
12 LOAD_GLOBAL 0 (range)
14 LOAD_FAST 2 (j)
16 CALL_FUNCTION 1
18 GET_ITER
>> 20 FOR_ITER 4 (to 30)
22 STORE_FAST 3 (_)
24 LOAD_FAST 1 (i)
26 LIST_APPEND 3
28 JUMP_ABSOLUTE 10 (to 20)
>> 30 JUMP_ABSOLUTE 2 (to 4)
>> 32 RETURN_VALUE

In [237]: dis.dis(repeat_element_chain)
25 0 LOAD_CONST 1 (<code object <listcomp> at 0x000001C86CC22600, file "<ipython-input-225-c385a750c738>", line 25>)
2 LOAD_CONST 2 ('repeat_element_chain.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (zip)
8 LOAD_FAST 0 (data)
10 LOAD_FAST 1 (mult)
12 CALL_FUNCTION 2
14 GET_ITER
16 CALL_FUNCTION 1
18 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x000001C86CC22600, file "<ipython-input-225-c385a750c738>", line 25>:
25 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 18 (to 42)
6 UNPACK_SEQUENCE 2
8 STORE_FAST 1 (a)
10 STORE_FAST 2 (b)
12 LOAD_GLOBAL 0 (list)
14 LOAD_GLOBAL 1 (chain)
16 LOAD_CONST 0 (<code object <genexpr> at 0x000001C86BF07260, file "<ipython-input-225-c385a750c738>", line 25>)
18 LOAD_CONST 1 ('repeat_element_chain.<locals>.<listcomp>.<genexpr>')
20 MAKE_FUNCTION 0
22 LOAD_GLOBAL 2 (zip)
24 LOAD_FAST 1 (a)
26 LOAD_FAST 2 (b)
28 CALL_FUNCTION 2
30 GET_ITER
32 CALL_FUNCTION 1
34 CALL_FUNCTION_EX 0
36 CALL_FUNCTION 1
38 LIST_APPEND 2
40 JUMP_ABSOLUTE 2 (to 4)
>> 42 RETURN_VALUE

Disassembly of <code object <genexpr> at 0x000001C86BF07260, file "<ipython-input-225-c385a750c738>", line 25>:
0 GEN_START 0

25 2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 10 (to 26)
6 UNPACK_SEQUENCE 2
8 STORE_FAST 1 (i)
10 STORE_FAST 2 (j)
12 LOAD_FAST 1 (i)
14 BUILD_LIST 1
16 LOAD_FAST 2 (j)
18 BINARY_MULTIPLY
20 YIELD_VALUE
22 POP_TOP
24 JUMP_ABSOLUTE 2 (to 4)
>> 26 LOAD_CONST 0 (None)
28 RETURN_VALUE


I barely understand any of the commands.


  [1]: https://i.stack.imgur.com/p5gyp.png

</details>


# 答案1
**得分**: 1

如果您对单独的组件（理解 vs 乘法）和（理解 vs 多重 .extend）进行基准测试，您会发现，理解并不总是比以多个项目一次处理的低级代码更快。

```python
from timeit import timeit

t0 = timeit(lambda: [3 for _ in range(1000)], number=1000)
t1 = timeit(lambda: [3] * 1000, number=1000)

print(t0) # 0.0317
print(t1) # 0.0018

source = [1, 2, 3] * 100
t0 = timeit(lambda: [n for _ in range(10) for n in source], number=1000)
t1 = timeit(lambda: [any(L.extend(source) for L in [[]] for _ in range(10))], number=1000)

print(t0) # 0.0514
print(t1) # 0.0072

结果可能会因大小和乘数而异，但大多数情况下这应该成立。

英文:

If you benchmark the individual components (comprehension vs multiplication) and (comprehension vs multiple .extend), you will see that comprehension are not always faster than low level code that processes multiple items at a time.

from timeit import timeit

t0  = timeit(lambda:[3 for _ in range(1000)], number = 1000)
t1  = timeit(lambda:[3]*1000, number = 1000)

print(t0) # 0.0317
print(t1) # 0.0018

source = [1,2,3]*100
t0  = timeit(lambda:[n for _ in range(10) for n in source], number = 1000)
t1  = timeit(lambda:[ any(L.extend(source) for L in [[]] for _ in range(10))], number = 1000)

print(t0) # 0.0514
print(t1) # 0.0072

Results may vary depending on the sizes and multipliers but most of the time this should hold true.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

嵌套列表中的元素重复不同次数，为什么更智能的方法更慢？

问题

如何在项目的多个模块中构建 GUI 屏幕？

在Python中，如何将多维列表中的每个元素乘以一个数字？

使用 Pandas 中基于日期和分类列的滚动函数

如何处理链式错误？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论