英文:
Efficient caching strategy for a symmetric function in Python
问题
Here's the translation of the code portion:
我有以下形式的代码(这个示例要简单得多,但可以传达要点):
```python
i = range(10) # 实际上要大得多
def func(a, b): # 我的实际函数比这复杂得多,但重要的是 func(a, b) = func(b, a)
return a + b
total = 0
for i_prime in i:
for i_prime_prime in i:
if i_prime != i_prime_prime:
total += func(i_prime, i_prime_prime)
如何减少调用 func
的次数,利用了 func(a, b) = func(b, a)
这一事实。我想到了以下方法:
cache = {}
total = 0
for i_prime in i:
for i_prime_prime in i:
if i_prime != i_prime_prime:
key = str(list({i_prime, i_prime_prime}))
if key in cache.keys():
total += cache[key]
else:
temp = func(i_prime, i_prime_prime)
cache[key] = temp
total += temp
但我觉得执行 key = str(list({i_prime, i_prime_prime}))
不是最高效的方法。
<details>
<summary>英文:</summary>
I have code of the following form (this example is much simpler but it gets the point across):
```python
i = range(10) # in reality much larger
def func(a, b): # my actual function is much more complicated than this, but the
# important thing is that func(a, b) = func(b, a)
return a + b
total = 0
for i_prime in i:
for i_prime_prime in i:
if i_prime != i_prime_prime:
total += func(i_prime, i_prime_prime)
How can I reduce the number of times I call func
, taking advantage of the fact that func(a, b) = func(b, a)
. I came up with the following:
cache = {}
total = 0
for i_prime in i:
for i_prime_prime in i:
if i_prime != i_prime_prime:
key = str(list({i_prime, i_prime_prime}))
if key in cache.keys():
total += cache[key]
else:
temp = func(i_prime, i_prime_prime)
cache[key] = temp
total += temp
But I feel like doing key = str(list({i_prime, i_prime_prime}))
is not the most efficient way to do it.
答案1
得分: 1
我很懒。我只是使用Python已经存在的@cache,并接受轻微的效率损失:
@cache
def f(a, b):
if a > b:
return f(b, a)
.... 计算f(a, b) ...
如果这变得非常时间关键,我会更加努力。但是三行代码几乎满足了我所需的一切。
英文:
I'm lazy. I'd just used Python's already existing @cache and live with the minor inefficiency:
@cache
def f(a, b):
if a > b:
return f(b, a)
.... calculate f(a, b) ...
If this turned out to be utterly time critical, I'd try harder. But three lines of code gives me almost everything I need.
答案2
得分: 1
If you call the function "irregularly," then caching/flipping like Frank showed is good. Or if you want to roll your own, then:
key = str(sorted((a, b)))
if key in cache:
or
key = tuple(sorted((a, b)))
if key in cache:
or
key = (a, b) if a < b else (b, a)
if key in cache:
would likely be better than your:
key = str(list({a, b}))
if key in cache.keys():
because:
- Sorting a pair really does make
(a, b)
and(b, a)
equal. Unlike your set, as for examplelist({0, 8})
andlist({8, 0})
become the different keys[0, 8]
and[8, 0]
(implementation detail, but has been like this for many years) while sorting leads to[0, 8]
for both. - Creating a keys view for each membership check is pointless and wasteful.
- Tuple might be faster than string, at least it should be at creation, although it might be slower at hashing. Would have to measure, with the actual data.
But if you do something with all pairs, each pair once, like your demo shows and like your comments suggest, then it might be better to simply only produce one of (a, b)
and (b, a)
, compute its function value, and incorporate it into your result twice:
from itertools import combinations
total = sum(
2 * func(a, b)
for a, b in combinations(i, 2)
)
Or even:
from itertools import combinations, starmap
total = 2 * sum(starmap(func, combinations(i, 2)))
英文:
If you call the function "irregularly", then caching/flipping like Frank showed is good. Or if you want to roll your own, then
key = str(sorted((a, b)))
if key in cache:
or
key = tuple(sorted((a, b)))
if key in cache:
or
key = (a, b) if a < b else (b, a)
if key in cache:
would likely be better than your
key = str(list({a, b}))
if key in cache.keys():
because:
- Sorting a pair really does make
(a, b)
and(b, a)
equal. Unlike your set, as for examplelist({0, 8})
andlist({8, 0})
become the different keys[0, 8]
and[8, 0]
(implementation detail, but has been like this for many years) while sorting leads to[0, 8]
for both. - Creating a keys view for each membership check is pointless and wasteful.
- Tuple might be faster than string, at least it should be at creation, although it might be slower at hashing. Would have to measure, with the actual data.
But if you do something with all pairs, each pair once, like your demo shows and like your comments suggest, then it might be better to simply only produce one of (a, b)
and (b, a)
, compute its function value, and incorporate it into your result twice:
from itertools import combinations
total = sum(
2 * func(a, b)
for a, b in combinations(i, 2)
)
Or even:
from itertools import combinations, starmap
total = 2 * sum(starmap(func, combinations(i, 2)))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论