2023年7月23日 21:56:32go评论117阅读模式

英文:

Efficient search for longest ascending subsequence in large lists of integers with Python

问题

如何在Python中开发一个高效可扩展的算法实现来查找大型整数列表中的最长递增子序列？我有一个包含数百万个元素的列表，而我的当前解决方案速度太慢了。以下是我当前的Python代码作为起点：

def longest_increasing_subsequence(nums):
    n = len(nums)
    lis = [1] * n
    for i in range(1, n):
        for j in range(0, i):
            if nums[i] &gt; nums[j] and lis[i] &lt; lis[j] + 1:
                lis[i] = lis[j] + 1
    max_length = max(lis)
    result = []
    for i in range(n - 1, -1, -1):
        if lis[i] == max_length:
            result.append(nums[i])
            max_length -= 1
    return result[::-1]

我注意到这段代码对于大型输入列表非常慢。是否有优化此算法的方法，甚至使用其他更适合查找大型整数列表中的最长递增子序列的算法？我愿意听取任何想法来提高运行时间和内存要求。提前感谢！

英文:

How can I develop an efficient and scalable algorithm implementation in Python to find the longest ascending subsequence in a large list of integers? I have a list with millions of elements, and my current solution is too slow. Here is my current Python code as a starting point:

def longest_increasing_subsequence(nums):
    n = len(nums)
    lis = [1] * n
    for i in range(1, n):
        for j in range(0, i):
            if nums[i] &gt; nums[j] and lis[i] &lt; lis[j] + 1:
                lis[i] = lis[j] + 1
    max_length = max(lis)
    result = []
    for i in range(n - 1, -1, -1):
        if lis[i] == max_length:
            result.append(nums[i])
            max_length -= 1
    return result[::-1]

I have noticed that this code is very slow for large input lists. Is there any way to optimize this algorithm or even use another algorithm that is better suited to find the longest ascending subsequence in a large list of integers? I am open to any ideas to improve the running time and memory requirements. Thanks in advance!

答案1

得分: 0

import bisect
def longest_increasing_subsequence(nums: list[int]) -> list[int]:
    if not nums:
        return []
    piles = []
    indices = [0 for _ in range(len(nums))]
    predecessors = [None for _ in range(len(nums))]
    for i, num in enumerate(nums):
        pile = bisect.bisect_left(piles, num)
        if pile == len(piles):
            piles.append(num)
        else:
            piles[pile] = num
        indices[pile] = i
        predecessors[i] = indices[pile - 1] if pile else None
    last_index = indices[len(piles) - 1]
    longest_subsequence = []
    while last_index is not None:
        longest_subsequence.append(nums[last_index])
        last_index = predecessors[last_index]
    return longest_subsequence[::-1]

示例用法:

>>> longest_increasing_subsequence([10, 9, 2, 5, 3, 7, 101, 18])
[2, 3, 7, 18]
>>> longest_increasing_subsequence([0, 1, 0, 3, 2, 3])
[0, 1, 2, 3]
>>> longest_increasing_subsequence([7, 7, 7, 7, 7, 7])
[7]

英文:

You can find the longest increasing subsequence in O(n log n) time with an approach that combines dynamic programming (your current approach) with binary search (in this case via the <code>bisect.<b>bisect_left</b></code> method):

import bisect
def longest_increasing_subsequence(nums: list[int]) -&gt; list[int]:
    if not nums:
        return []
    piles = []
    indices = [0 for _ in range(len(nums))]
    predecessors = [None for _ in range(len(nums))]
    for i, num in enumerate(nums):
        pile = bisect.bisect_left(piles, num)
        if pile == len(piles):
            piles.append(num)
        else:
            piles[pile] = num
        indices[pile] = i
        predecessors[i] = indices[pile - 1] if pile else None
    last_index = indices[len(piles) - 1]
    longest_subsequence = []
    while last_index is not None:
        longest_subsequence.append(nums[last_index])
        last_index = predecessors[last_index]
    return longest_subsequence[::-1]

Example Usage:

&gt;&gt;&gt; longest_increasing_subsequence([10, 9, 2, 5, 3, 7, 101, 18])
[2, 3, 7, 18]
&gt;&gt;&gt; longest_increasing_subsequence([0, 1, 0, 3, 2, 3])
[0, 1, 2, 3]
&gt;&gt;&gt; longest_increasing_subsequence([7, 7, 7, 7, 7, 7])
[7]

答案2

得分: 0

如上所述，您的解决方案具有二次时间复杂度。然而，根据维基百科的1，最优解可以在O(n log(n))的时间复杂度内找到。

我修复了我在2中找到的代码，以获得一个在最优时间复杂度内运行的函数，并测试结果与您的原始函数一致：

# 二分查找
def GetCeilIndex(arr, T, l, r, key):
    while r - l > 1:
        m = l + (r - l) // 2
        if arr[T[m]] >= key:
            r = m
        else:
            l = m
    return r
def LongestIncreasingSubsequence(arr):
    n = len(arr)
    # 初始化为0
    tailIndices = [0 for i in range(n + 1)]
    # 初始化为-1
    prevIndices = [-1 for i in range(n + 1)]
    # 它将始终指向空位置
    length = 1
    for i in range(1, n):
        if arr[i] < arr[tailIndices[0]]:
            # 新的最小值
            tailIndices[0] = i
        elif arr[i] > arr[tailIndices[length - 1]]:
            # arr[i] 想要扩展
            # 最大子序列
            prevIndices[i] = tailIndices[length - 1]
            tailIndices[length] = i
            length += 1
        else:
            # arr[i] 想要成为
            # 未来子序列的潜在候选
            # 它将替换tailIndices中的ceil值
            pos = GetCeilIndex(arr, tailIndices, -1, length - 1, arr[i])
            prevIndices[i] = tailIndices[pos - 1]
            tailIndices[pos] = i
    res = []
    i = tailIndices[length - 1]
    while len(res) < length:
        res.append(arr[i])
        i = prevIndices[i]
    res.reverse()
    return res

英文:

As said above, your solution is in quadratic time complexity. However according to wikipedia the optimal solution can be found in O(n log(n)) time complexity.

I fixed a code I found here to get a function that runs in optimal time complexity and I tested that the results are consistent with your original function:

# Binary search
def GetCeilIndex(arr, T, l, r, key):
    while r - l &gt; 1:
        m = l + (r - l) // 2
        if arr[T[m]] &gt;= key:
            r = m
        else:
            l = m
    return r
def LongestIncreasingSubsequence(arr):
    n = len(arr)
    # Initialized with 0
    tailIndices = [0 for i in range(n + 1)]
    # Initialized with -1
    prevIndices = [-1 for i in range(n + 1)]
    # it will always point
    # to empty location
    length = 1
    for i in range(1, n):
        if arr[i] &lt; arr[tailIndices[0]]:
            # new smallest value
            tailIndices[0] = i
        elif arr[i] &gt; arr[tailIndices[length - 1]]:
            # arr[i] wants to extend
            # largest subsequence
            prevIndices[i] = tailIndices[length - 1]
            tailIndices[length] = i
            length += 1
        else:
            # arr[i] wants to be a
            # potential condidate of
            # future subsequence
            # It will replace ceil
            # value in tailIndices
            pos = GetCeilIndex(arr, tailIndices, -1, length - 1, arr[i])
            prevIndices[i] = tailIndices[pos - 1]
            tailIndices[pos] = i
    res = []
    i = tailIndices[length - 1]
    while len(res) &lt; length:
        res.append(arr[i])
        i = prevIndices[i]
    res.reverse()
    return res

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

用Python在大型整数列表中高效搜索最长递增子序列

问题

答案1

答案2

You can plot [sin(nx)/sin(x)]^2 如何绘制？

如何在Django管理面板中找到我拥有的站点ID。

将BigQuery查询任务的结果传递给下一个任务，同时使用模板宏。

UnboundLocalError: local variable ‘bmi’ referenced before assignment line 38

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。