Efficient caching strategy for a symmetric function in Python

huangapple go评论67阅读模式
英文:

Efficient caching strategy for a symmetric function in Python

问题

Here's the translation of the code portion:

我有以下形式的代码这个示例要简单得多但可以传达要点):
```python
i = range(10)   # 实际上要大得多

def func(a, b): # 我的实际函数比这复杂得多,但重要的是 func(a, b) = func(b, a)
    return a + b

total = 0
for i_prime in i:
    for i_prime_prime in i:
        if i_prime != i_prime_prime:
            total += func(i_prime, i_prime_prime)

如何减少调用 func 的次数,利用了 func(a, b) = func(b, a) 这一事实。我想到了以下方法:

cache = {}
total = 0
for i_prime in i:
    for i_prime_prime in i:
        if i_prime != i_prime_prime:
            key = str(list({i_prime, i_prime_prime}))
            if key in cache.keys():
                total += cache[key]
            else:
                temp       = func(i_prime, i_prime_prime)
                cache[key] = temp
                total     += temp

但我觉得执行 key = str(list({i_prime, i_prime_prime})) 不是最高效的方法。


<details>
<summary>英文:</summary>

I have code of the following form (this example is much simpler but it gets the point across):
```python
i = range(10)   # in reality much larger

def func(a, b): # my actual function is much more complicated than this, but the
                # important thing is that func(a, b) = func(b, a)
    return a + b

total = 0
for i_prime in i:
    for i_prime_prime in i:
        if i_prime != i_prime_prime:
            total += func(i_prime, i_prime_prime)

How can I reduce the number of times I call func, taking advantage of the fact that func(a, b) = func(b, a). I came up with the following:

cache = {}
total = 0
for i_prime in i:
    for i_prime_prime in i:
        if i_prime != i_prime_prime:
            key = str(list({i_prime, i_prime_prime}))
            if key in cache.keys():
                total += cache[key]
            else:
                temp       = func(i_prime, i_prime_prime)
                cache[key] = temp
                total     += temp

But I feel like doing key = str(list({i_prime, i_prime_prime})) is not the most efficient way to do it.

答案1

得分: 1

我很懒。我只是使用Python已经存在的@cache,并接受轻微的效率损失:

@cache
def f(a, b):
    if a > b:
        return f(b, a)
    .... 计算f(a, b) ...

如果这变得非常时间关键,我会更加努力。但是三行代码几乎满足了我所需的一切。

英文:

I'm lazy. I'd just used Python's already existing @cache and live with the minor inefficiency:

@cache
def f(a, b):
    if a &gt; b:
        return f(b, a)
    .... calculate f(a, b) ...

If this turned out to be utterly time critical, I'd try harder. But three lines of code gives me almost everything I need.

答案2

得分: 1

If you call the function "irregularly," then caching/flipping like Frank showed is good. Or if you want to roll your own, then:

key = str(sorted((a, b)))
if key in cache:

or

key = tuple(sorted((a, b)))
if key in cache:

or

key = (a, b) if a < b else (b, a)
if key in cache:

would likely be better than your:

key = str(list({a, b}))
if key in cache.keys():

because:

  1. Sorting a pair really does make (a, b) and (b, a) equal. Unlike your set, as for example list({0, 8}) and list({8, 0}) become the different keys [0, 8] and [8, 0] (implementation detail, but has been like this for many years) while sorting leads to [0, 8] for both.
  2. Creating a keys view for each membership check is pointless and wasteful.
  3. Tuple might be faster than string, at least it should be at creation, although it might be slower at hashing. Would have to measure, with the actual data.

But if you do something with all pairs, each pair once, like your demo shows and like your comments suggest, then it might be better to simply only produce one of (a, b) and (b, a), compute its function value, and incorporate it into your result twice:

from itertools import combinations

total = sum(
    2 * func(a, b)
    for a, b in combinations(i, 2)
)

Or even:

from itertools import combinations, starmap

total = 2 * sum(starmap(func, combinations(i, 2)))

Attempt This Online!

英文:

If you call the function "irregularly", then caching/flipping like Frank showed is good. Or if you want to roll your own, then

key = str(sorted((a, b)))
if key in cache:

or

key = tuple(sorted((a, b)))
if key in cache:

or

key = (a, b) if a &lt; b else (b, a)
if key in cache:

would likely be better than your

key = str(list({a, b}))
if key in cache.keys():

because:

  1. Sorting a pair really does make (a, b) and (b, a) equal. Unlike your set, as for example list({0, 8}) and list({8, 0}) become the different keys [0, 8] and [8, 0] (implementation detail, but has been like this for many years) while sorting leads to [0, 8] for both.
  2. Creating a keys view for each membership check is pointless and wasteful.
  3. Tuple might be faster than string, at least it should be at creation, although it might be slower at hashing. Would have to measure, with the actual data.

But if you do something with all pairs, each pair once, like your demo shows and like your comments suggest, then it might be better to simply only produce one of (a, b) and (b, a), compute its function value, and incorporate it into your result twice:

from itertools import combinations

total = sum(
    2 * func(a, b)
    for a, b in combinations(i, 2)
)

Or even:

from itertools import combinations, starmap

total = 2 * sum(starmap(func, combinations(i, 2)))

Attempt This Online!

huangapple
  • 本文由 发表于 2023年6月15日 07:25:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76478174.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定