如何在Python中优化Pascal’s Triangle?

huangapple go评论76阅读模式
英文:

How to optimize Pascal's Triagnle in Python?

问题

I have implemented the Pascal's Triangle in Python, it is pretty efficient, but it isn't efficient enough and there are a few things I don't like.

The Pascal's Triangle is like the following:

如何在Python中优化Pascal’s Triangle?

I have read this useless tutorial and this question, and the solutions are extremely inefficient, involving factorials and don't use caching.

Instead, I implemented a different algorithm I created myself. My mathematics isn't that good, but I have spotted the following simple recursive relationships:

The triangle starts with a row with only 1 number in it, and that number is 1.

For each subsequent row, the length of the row increments by 1, and the first and last number of the row is 1.

Each number that isn't the first or last is the sum of the number at the row above it with an index equal to the number's index minus 1, and the number at the row above it with the same index.

And the rows of the triangle are symmetric.

In other words, if we use zero-based indexing:

p(r, 0) = p(r, r) = 1
p(r, c) = p(r - 1, c - 1) + p(r - 1, c)
p(r, c) = p(r, r - c)

Below is my code:

from typing import List

class Pascal_Triangle:
    def __init__(self, rows: int = 0, fill: bool = True):
        self.data = []
        self.length = 0
        if rows:
            self.fill_rows(rows)
        if fill:
            self.fill_values()
    
    def add_row(self, length: int):
        row = [0] * length
        row[0] = row[-1] = 1
        self.data.append(row)
    
    def fill_rows(self, rows: int):
        for length in range(self.length + 1, rows + 1):
            self.add_row(length)
        self.length = rows
    
    def comb(self, a: int, b: int) -> int:
        if not 0 <= b <= a:
            raise ValueError(f'cannot choose {b} elements from a population of {a}')
        
        if self.length < (length := a + 1):
            self.fill_rows(length)
        
        return self.at(a, b)
    
    def at(self, row: int, col: int) -> int:
        if val := self.data[row][row - col]:
            self.data[row][col] = val
            return val
        
        if val := self.data[row][col]:
            return val
        
        self.data[row][col] = val = self.at(row - 1, col - 1) + self.at(row - 1, col)
        return val
    
    def fill_values(self):
        for row in range(2, self.length):
            for col in range(1, row):
                self.at(row, col)
    
    def get_row(self, row: int) -> List[int]:
        if self.length < (length := row + 1):
            self.fill_rows(length)
        
        self.fill_values()
        return self.data[row]
    
    def pretty_print(self):
        print('\n'.join(f"{' ' * (self.length - i)}{' '.join(map(str, row))}" for i, row in enumerate(self.data)))

First, the output of tri = Pascal_Triangle(12); tri.pretty_print() is extremely ugly:

            1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
1 10 45 120 210 252 210 120 45 10 1
1 11 55 165 330 462 462 330 165 55 11 1

How can I dynamically adjust the spacing between the elements so that the output looks more like an equilateral triangle?

Second, I don't like the recursive function, is there any way that I can get rid of the recursive function and calculate the values using the recursive relationship by iteration, while remembering already computed numbers?

Third, is there a data structure more efficient than my nested lists for the same data? I have thought of numpy.array but arrays need each row to have the same length and arrays can't grow.

Finally, can my algorithm be optimized further?

The data after calling tri.at(16, 5) is:

[[1],
 [1, 1],
 [1, 2, 1],
 [1, 3, 3, 1],
 [1, 4, 6, 4, 1],
 [1, 5, 10, 10, 5, 1],
 [1, 6, 15, 20, 15, 6, 1],
 [1, 7, 21, 35, 35, 21, 0, 1],
 [1, 8, 28, 56, 70, 56, 0, 0, 1],
 [1, 9, 36, 84, 126, 126, 0, 0, 0, 1],
 [1, 10, 45, 120, 210, 252, 0, 0, 0, 0, 1],
 [1, 11, 55, 165, 330, 462, 0, 0, 0, 0, 0, 1],
 [1, 12, 66, 220, 495, 792, 0, 0, 0, 0, 0, 0, 1],
 [1, 0, 78, 286, 715, 1287, 0, 0, 0, 0, 0, 0, 0, 1],
 [1, 0, 0, 364, 1001, 2002, 0, 0, 0, 0, 0, 0, 0, 0, 1],
 [1, 0, 0, 0, 1365, 300

<details>
<summary>英文:</summary>

I have implemented the [Pascal&#39;s Triangle](https://en.wikipedia.org/wiki/Pascal%27s_triangle) in Python, it is pretty efficient, but it isn&#39;t efficient enough and there are a few things I don&#39;t like.

The Pascal&#39;s Triangle is like the following:


![](https://wikimedia.org/api/rest_v1/media/math/render/svg/23050fcb53d6083d9e42043bebf2863fa9746043)


I have read this useless [tutorial](https://www.geeksforgeeks.org/python-program-to-print-pascals-triangle/) and this [question](https://stackoverflow.com/questions/24093387/pascals-triangle-for-python), and the solutions are extremely inefficient, involving factorials and don&#39;t use caching.

Instead, I implemented a different algorithm I created myself. My mathematics isn&#39;t that good, but I have spotted the following simple recursive relationships:

The triangle starts with a row with only 1 number in it, and that number is 1.

For each subsequent row, the length of the row increment by 1, and the first and last number of the row is 1.

Each number that isn&#39;t the first or last, is the sum of the number at the row above it with index equal to the number&#39;s index minus 1, and the number at row above it with the same index.


And the rows of the triangle are symmetric.

In other words, if we use zero-based indexing:

p(r, 0) = p(r, r) = 1
p(r, c) = p(r - 1, c - 1) + p(r - 1, c)
p(r, c) = p(r, r - c)


Below is my code:
```python
from typing import List
class Pascal_Triangle:
def __init__(self, rows: int = 0, fill: bool = True):
self.data = []
self.length = 0
if rows:
self.fill_rows(rows)
if fill:
self.fill_values()
def add_row(self, length: int):
row = [0] * length
row[0] = row[-1] = 1
self.data.append(row)
def fill_rows(self, rows: int):
for length in range(self.length + 1, rows + 1):
self.add_row(length)
self.length = rows
def comb(self, a: int, b: int) -&gt; int:
if not 0 &lt;= b &lt;= a:
raise ValueError(f&#39;cannot choose {b} elements from a population of {a}&#39;)
if self.length &lt; (length := a + 1):
self.fill_rows(length)
return self.at(a, b)
def at(self, row: int, col: int) -&gt; int:
if val := self.data[row][row - col]:
self.data[row][col] = val
return val
if val := self.data[row][col]:
return val
self.data[row][col] = val = self.at(row - 1, col - 1) + self.at(row - 1, col)
return val
def fill_values(self):
for row in range(2, self.length):
for col in range(1, row):
self.at(row, col)
def get_row(self, row: int) -&gt; List[int]:
if self.length &lt; (length := row + 1):
self.fill_rows(length)
self.fill_values()
return self.data[row]
def pretty_print(self):
print(&#39;\n&#39;.join(f&quot;{&#39; &#39; * (self.length - i)}{&#39; &#39;.join(map(str, row))}&quot; for i, row in enumerate(self.data)))

First, the output of tri = Pascal_Triangle(12); tri.pretty_print() is extremely ugly:

            1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
1 10 45 120 210 252 210 120 45 10 1
1 11 55 165 330 462 462 330 165 55 11 1

How can I dynamically adjust the spacing between the elements so that the output looks more like an equilateral triangle?

Second I don't like the recursive function, is there any way that I can get rid of the recursive function and calculate the values using the recursive relationship by iteration, while remembering already computed numbers?

Third, is there a data structure more efficient than my nested lists for the same data? I have thought of numpy.array but arrays need each row to have the same length and arrays can't grow.

Finally can my algorithm be optimized further?


The data after calling tri.at(16, 5) is:

[[1],
[1, 1],
[1, 2, 1],
[1, 3, 3, 1],
[1, 4, 6, 4, 1],
[1, 5, 10, 10, 5, 1],
[1, 6, 15, 20, 15, 6, 1],
[1, 7, 21, 35, 35, 21, 0, 1],
[1, 8, 28, 56, 70, 56, 0, 0, 1],
[1, 9, 36, 84, 126, 126, 0, 0, 0, 1],
[1, 10, 45, 120, 210, 252, 0, 0, 0, 0, 1],
[1, 11, 55, 165, 330, 462, 0, 0, 0, 0, 0, 1],
[1, 12, 66, 220, 495, 792, 0, 0, 0, 0, 0, 0, 1],
[1, 0, 78, 286, 715, 1287, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 364, 1001, 2002, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 1365, 3003, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 4368, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]]

I know I am already doing memoization, and that is not what I meant. I want to calculate the unfilled values without ever using a recursive function. Instead of using the recursive definition and going backwards, we can somehow use iteration, start from where the lowest value that was filled and needed for the query, and iterate through all needed numbers, make two copies of each number and go forwards, until the requested index was reached.

The needed numbers can be computed using indexing and mathematics.

In this way there is no recursive function call at all.


Update

I have rewrote my code to the following:

class Pascal_Triangle:
def __init__(self, end_row: int = 2, opt: int = 0):
self.data = [[1], [1, 1]]
self.length = 2
self.opt = [self.add_rows_o0, self.add_rows_o1]
if end_row &gt; 2:
self.opt[opt](end_row)
def add_rows_o0(self, end_row: int):
last_row = self.data[-1]
for _ in range(self.length, end_row):
self.data.append(
last_row := [1] + [a + b for a, b in zip(last_row, last_row[1:])] + [1]
)
self.length = end_row
def add_rows_o1(self, end_row: int):
last_row = self.data[-1]
for n in range(self.length, end_row):
mid = n // 2 + 1
row = [0] * (n - 1)
m = n - 2
for i, (a, b) in enumerate(zip(last_row, last_row[1:mid])):
row[i] = row[m - i] = a + b
self.data.append(last_row := [1] + row + [1])
self.length = end_row
def pretty_print(self):
longest = len(str(self.data[-1][self.length // 2]))
line_length = (longest + 1) * self.length
for row in self.data:
print(&quot; &quot;.join(f&quot;{n:{longest}}&quot; for n in row).center(line_length))

I have used list comprehension to generate new rows and got rid of the expensive recursive function call, the code is much faster as a result.

However, I tried to exploit the symmetric nature of the rows and only calculate half of the row and mirror it to get the other half. In this way the number of calculations would be halved.

But it is actually slower:

In [257]: %timeit Pascal_Triangle(64, 1)
237 &#181;s &#177; 7.43 &#181;s per loop (mean &#177; std. dev. of 7 runs, 1,000 loops each)
In [258]: %timeit Pascal_Triangle(64, 0)
224 &#181;s &#177; 9.75 &#181;s per loop (mean &#177; std. dev. of 7 runs, 1,000 loops each)
In [259]: Pascal_Triangle(64, 1).data == Pascal_Triangle(64, 0).data
Out[259]: True

Why is it slower? And how can I actually skip the unnecessary calculations and make it faster?

答案1

得分: 2

  1. 你可以通过获取最长数字的长度(作为字符串)并将其用作所有数字宽度的基础来改进pretty_print;同时,使用str.center可能会更容易。
def pretty_print(self):
    longest = max(len(str(n)) for row in self.data for n in row)
    line_length = (longest + 1) * self.length
    for row in self.data:
        print(' '.join(f'{n:{longest}}' for n in row).center(line_length))
  1. 通过这个检查if val := self.data[row][col]: return val,你已经在做这个,每个值只计算一次。你可以直接在fill_values中使其成为纯迭代,然后完全放弃at方法:
def fill_values(self):
    for row in range(2, self.length):
        for col in range(1, row):
            self.data[row][col] = self.data[row - 1][col - 1] + self.data[row - 1][col]
  1. 在这里,我认为嵌套的列表是一个不错的选择,而且即使在第2点之前,你的算法应该已经尽可能高效了。

话虽如此,我注意到你有一个comb函数,所以也许你的目标不是真正打印三角形,而是计算单个值。在这种情况下,有两种可能使你的代码更快的方式(尽管我实际上没有计时)。

首先,你可以使用dict作为数据结构,然后只计算实际需要找到给定rowcol位置的值。在最坏的情况下(底部行的中心),这将占整个三角形的50%,平均情况远低于这个比例。

class Pascal_Triangle:
    def __init__(self):
        self.data = {(0, 0): 1}
        
    def fill_rows(self, rows: int):
        # 实际上,只需要最后一行就足够了...
        for row in range(rows + 1):
            for col in range(row + 1):
                self.at(row, col)
        
    def at(self, row: int, col: int) -> int:
        if not 0 <= col <= row:
            raise ValueError(f'column position {col} is invalid for row {row}')
        if (row, col) not in self.data:
            self.data[row, col] = 1 if col in (0, row) else self.at(row - 1, col - 1) + self.at(row - 1, col)
        return self.data[row, col]
    
    def pretty_print(self):
        longest = max(len(str(n)) for n in self.data.values())
        max_row = max(row for (row, col) in self.data)
        line_length = (longest + 1) * max_row
        for row in range(max_row+1):
            print(' '.join(str(self.data.get((row,col), "")).center(longest) for col in range(row + 1)).center(line_length))

这个版本仍然有fill_rowspretty_print函数(很好地显示了哪些值实际上被计算了)。如果你不需要这些,你也可以将at函数变成一个函数并使用functools.cache来缓存这些值...

from functools import cache

@cache
def at(row: int, col: int) -> int:
    if not 0 <= col <= row:
        raise ValueError(f'column position {col} is invalid for row {row}')
    return 1 if col in (0, row) else at(row - 1, col - 1) + at(row - 1, col)

...或者直接使用阶乘计算组合数:

from math import factorial as fac
def comb(n, k):
    return fac(n) // (fac(k) * fac(n - k))
英文:
  1. You can improve the pretty_print by getting the length (as string) of the longest number and using that as the basis for all the numbers' width; also using str.center might be easier.

     def pretty_print(self):
    longest = max(len(str(n)) for row in self.data for n in row)
    line_length = (longest + 1) * self.length
    for row in self.data:
    print(&#39; &#39;.join(f&#39;{n:{longest}}&#39; for n in row).center(line_length))
    
  2. With this check if val := self.data[row][col]: return val, you are already doing that, and each value is calculated exactly once. You could make it purely iterative in fill_values directly, and drop the at method entirely, though:

     def fill_values(self):
    for row in range(2, self.length):
    for col in range(1, row):
    self.data[row][col] = self.data[row - 1][col - 1] + self.data[row - 1][col]
    
  3. I'd say a nested list-of-lists is a good choice here, and your algorithm (even before 2.) should already be as efficient as possible.


Having said that, I noticed you have a comb function, so maybe your goal is not really to print the triangle, but to calculate individual values. In this case, there are two possible ways to make you code faster (although I did not actually time it).

First, you could use a dict as data structure and then only calculate the values that are actually needed to find the value at a given row and col. In the worst case (centre of bottom row) that will be 50% of the entire triangle, and on average much less than that.

class Pascal_Triangle:
def __init__(self):
self.data = {(0, 0): 1}
def fill_rows(self, rows: int):
# actually, just the last row would be enough here...
for row in range(rows + 1):
for col in range(row + 1):
self.at(row, col)
def at(self, row: int, col: int) -&gt; int:
if not 0 &lt;= col &lt;= row:
raise ValueError(f&#39;column position {col} is invalid for row {row}&#39;)
if (row, col) not in self.data:
self.data[row, col] = 1 if col in (0, row) else self.at(row - 1, col - 1) + self.at(row - 1, col)
return self.data[row, col]
def pretty_print(self):
longest = max(len(str(n)) for n in self.data.values())
max_row = max(row for (row, col) in self.data)
line_length = (longest + 1) * max_row
for row in range(max_row+1):
print(&#39; &#39;.join(str(self.data.get((row,col), &quot;&quot;)).center(longest) for col in range(row + 1)).center(line_length))

This version still has the fill_rows and pretty_print functions (nicely showing which values were actually calculated). If you don't need those, you could also just make at a function and use functools.cache to cache the values...

from functools import cache
@cache
def at(row: int, col: int) -&gt; int:
if not 0 &lt;= col &lt;= row:
raise ValueError(f&#39;column position {col} is invalid for row {row}&#39;)
return 1 if col in (0, row) else at(row - 1, col - 1) + at(row - 1, col)

... or calculate the binomial coefficient directly using factorials:

    from math import factorial as fac
def comb(n, k):
return fac(n) // (fac(k)*(fac(n-k)))

答案2

得分: 1

以下是已翻译的代码部分:

class Pascal_Triangle:
    def __init__(self, end_row: int = 1):
        self.rows = [[1], [1, 1]]
        if end_row > 1:
            self.add_rows(end_row)
    
    def add_rows(self, end_row: int):
        last_row = self.rows[-1]
        for i in range(len(self.rows), end_row + 1):
            last_row = [1] + [last_row[i] + last_row[i+1] for i in range(len(last_row) - 1)]  + [1]
            self.rows.append(last_row)
    
    def pretty_print(self):
        width = len(str(self.rows[-1][len(self.rows)//2]))
        print('\n'.join(f"{' ' * width * (len(self.rows) - i)}{' '.join(map(lambda n:f'{n:{width}}', row))}" for i, row in enumerate(self.rows)))

tri = Pascal_Triangle(6)
tri.pretty_print()

#               1
#             1   1
#           1   2   1
#         1   3   3   1
#       1   4   6   4   1
#     1   5  10  10   5   1
#   1   6  15  20  15   6   1

tri.add_rows(9)
tri.pretty_print()

#                                1
#                             1     1
#                          1     2     1
#                       1     3     3     1
#                    1     4     6     4     1
#                 1     5    10    10     5     1
#              1     6    15    20    15     6     1
#           1     7    21    35    35    21     7     1
#        1     8    28    56    70    56    28     8     1
#     1     9    36    84   126   126    84    36     9     1

请注意,已经将HTML实体字符(例如&gt;&#39;)翻译为相应的Python代码,以确保代码的正确性。

英文:

I tried to simplify the creation of rows, and to make (arguably) better the pretty-printing:

class Pascal_Triangle:
def __init__(self, end_row: int = 1):
self.rows = [[1], [1, 1]]
if end_row &gt; 1:
self.add_rows(end_row)
def add_rows(self, end_row: int):
last_row = self.rows[-1]
for i in range(len(self.rows), end_row + 1):
last_row = [1] + [last_row[i] + last_row[i+1] for i in range(len(last_row) - 1)]  + [1]
self.rows.append(last_row)
def pretty_print(self):
width = len(str(self.rows[-1][len(self.rows)//2]))
print(&#39;\n&#39;.join(f&quot;{&#39; &#39; * width * (len(self.rows) - i)}{(&#39; &#39;*width).join(map(lambda n:f&#39;{n:{width}}&#39;, row))}&quot; for i, row in enumerate(self.rows)))
tri = Pascal_Triangle(6)
tri.pretty_print()
#               1
#             1   1
#           1   2   1
#         1   3   3   1
#       1   4   6   4   1
#     1   5  10  10   5   1
#   1   6  15  20  15   6   1
tri.add_rows(9)
tri.pretty_print()
#                                1
#                             1     1
#                          1     2     1
#                       1     3     3     1
#                    1     4     6     4     1
#                 1     5    10    10     5     1
#              1     6    15    20    15     6     1
#           1     7    21    35    35    21     7     1
#        1     8    28    56    70    56    28     8     1
#     1     9    36    84   126   126    84    36     9     1

答案3

得分: 0

以下是没有代码的翻译部分:

有一种方法可以在不使用递归的情况下获取帕斯卡三角形的一行:

def pascalLine(N):
    r = [1]
    for p in range(N): 
        r.append(r[-1]*(N-p)//(p+1))
    return r

为了漂亮地格式化输出,您可以计算最大数字的大小,并使用它来打印具有固定宽度的数字(这样对齐就容易计算):

N     = 10
width = len(str(max(pascalLine(N-1))))*2
for i in range(N):
    line = "".join(f"{n:^{width}}" for n in pascalLine(i)).center(width*N)
    print(line)

您还可以在计算帕斯卡行时进行计算,但需要估计最大值的大小以获得适当的格式化。对于帕斯卡三角形,每行上的数字之和为2^N,因此一行的最大值最多为该值的一半。

# 帕斯卡三角形大小
N = 10

# 估计固定值打印宽度
width  = len(str(2**(N-1))) + 2 # 额外的空格

# 居中打印每一行,数字也居中
P = [1]
for i in range(N):
    print("".join(str(n).center(width) for n in P).center(width*N))
    P = [1,*map(sum,zip(P,P[1:])),1] # 计算下一行

希望这有所帮助。

英文:

There is a way to obtain a pascal triangle line without recursion:

def pascalLine(N):
r = [1]
for p in range(N): 
r.append(r[-1]*(N-p)//(p+1))
return r

To format the output nicely, you can compute the size of the largest number and use that to print numbers with a fixed width (which makes the alignement much easier to compute):

N     = 10
width = len(str(max(pascalLine(N-1))))*2
for i in range(N):
line = &quot;&quot;.join(f&quot;{n:^{width}}&quot; for n in pascalLine(i)).center(width*N)
print(line)
1                              
1     1                           
1     2     1                        
1     3     3     1                     
1     4     6     4     1                  
1     5     10    10    5     1               
1     6     15    20    15    6     1            
1     7     21    35    35    21    7     1         
1     8     28    56    70    56    28    8     1      
1     9     36    84   126   126    84    36    9     1 

You could also compute the pascal lines as you go but you need to estimate the size of the largest value in order to get proper formatting. For a Pascal triangle the sum of numbers on each line is 2^N so the maximum value of a line will be at most half of that.

# pascal triangle size
N = 10
# estimate fixed value printing width
width  = len(str(2**(N-1))) + 2 # extra for spacing
# print each line centred, with centred numbers
P = [1]
for i in range(N):
print(&quot;&quot;.join(str(n).center(width) for n in P).center(width*N))
P = [1,*map(sum,zip(P,P[1:])),1] # compute next line
1                              
1     1                           
1     2     1                        
1     3     3     1                     
1     4     6     4     1                  
1     5     10    10    5     1               
1     6     15    20    15    6     1            
1     7     21    35    35    21    7     1         
1     8     28    56    70    56    28    8     1      
1     9     36    84   126   126    84    36    9     1 

huangapple
  • 本文由 发表于 2023年7月23日 17:14:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76747462.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定