用Python在大型整数列表中高效搜索最长递增子序列

huangapple go评论90阅读模式
英文:

Efficient search for longest ascending subsequence in large lists of integers with Python

问题

如何在Python中开发一个高效可扩展的算法实现来查找大型整数列表中的最长递增子序列?我有一个包含数百万个元素的列表,而我的当前解决方案速度太慢了。以下是我当前的Python代码作为起点:

def longest_increasing_subsequence(nums):
    n = len(nums)
    lis = [1] * n
    for i in range(1, n):
        for j in range(0, i):
            if nums[i] > nums[j] and lis[i] < lis[j] + 1:
                lis[i] = lis[j] + 1

    max_length = max(lis)
    result = []

    for i in range(n - 1, -1, -1):
        if lis[i] == max_length:
            result.append(nums[i])
            max_length -= 1

    return result[::-1]

我注意到这段代码对于大型输入列表非常慢。是否有优化此算法的方法,甚至使用其他更适合查找大型整数列表中的最长递增子序列的算法?我愿意听取任何想法来提高运行时间和内存要求。提前感谢!

英文:

How can I develop an efficient and scalable algorithm implementation in Python to find the longest ascending subsequence in a large list of integers? I have a list with millions of elements, and my current solution is too slow. Here is my current Python code as a starting point:

def longest_increasing_subsequence(nums):
    n = len(nums)
    lis = [1] * n
    for i in range(1, n):
        for j in range(0, i):
            if nums[i] > nums[j] and lis[i] < lis[j] + 1:
                lis[i] = lis[j] + 1

    max_length = max(lis)
    result = []

    for i in range(n - 1, -1, -1):
        if lis[i] == max_length:
            result.append(nums[i])
            max_length -= 1

    return result[::-1]

I have noticed that this code is very slow for large input lists. Is there any way to optimize this algorithm or even use another algorithm that is better suited to find the longest ascending subsequence in a large list of integers? I am open to any ideas to improve the running time and memory requirements. Thanks in advance!

答案1

得分: 0

import bisect

def longest_increasing_subsequence(nums: list[int]) -> list[int]:
    if not nums:
        return []
    piles = []
    indices = [0 for _ in range(len(nums))]
    predecessors = [None for _ in range(len(nums))]
    for i, num in enumerate(nums):
        pile = bisect.bisect_left(piles, num)
        if pile == len(piles):
            piles.append(num)
        else:
            piles[pile] = num
        indices[pile] = i
        predecessors[i] = indices[pile - 1] if pile else None
    last_index = indices[len(piles) - 1]
    longest_subsequence = []
    while last_index is not None:
        longest_subsequence.append(nums[last_index])
        last_index = predecessors[last_index]
    return longest_subsequence[::-1]

示例用法:

>>> longest_increasing_subsequence([10, 9, 2, 5, 3, 7, 101, 18])
[2, 3, 7, 18]
>>> longest_increasing_subsequence([0, 1, 0, 3, 2, 3])
[0, 1, 2, 3]
>>> longest_increasing_subsequence([7, 7, 7, 7, 7, 7])
[7]
英文:

You can find the longest increasing subsequence in O(n log n) time with an approach that combines dynamic programming (your current approach) with binary search (in this case via the <code>bisect.<b>bisect_left</b></code> method):

import bisect

def longest_increasing_subsequence(nums: list[int]) -&gt; list[int]:
    if not nums:
        return []
    piles = []
    indices = [0 for _ in range(len(nums))]
    predecessors = [None for _ in range(len(nums))]
    for i, num in enumerate(nums):
        pile = bisect.bisect_left(piles, num)
        if pile == len(piles):
            piles.append(num)
        else:
            piles[pile] = num
        indices[pile] = i
        predecessors[i] = indices[pile - 1] if pile else None
    last_index = indices[len(piles) - 1]
    longest_subsequence = []
    while last_index is not None:
        longest_subsequence.append(nums[last_index])
        last_index = predecessors[last_index]
    return longest_subsequence[::-1]

Example Usage:

&gt;&gt;&gt; longest_increasing_subsequence([10, 9, 2, 5, 3, 7, 101, 18])
[2, 3, 7, 18]
&gt;&gt;&gt; longest_increasing_subsequence([0, 1, 0, 3, 2, 3])
[0, 1, 2, 3]
&gt;&gt;&gt; longest_increasing_subsequence([7, 7, 7, 7, 7, 7])
[7]

答案2

得分: 0

如上所述,您的解决方案具有二次时间复杂度。然而,根据维基百科的1,最优解可以在O(n log(n))的时间复杂度内找到。

我修复了我在2中找到的代码,以获得一个在最优时间复杂度内运行的函数,并测试结果与您的原始函数一致:

# 二分查找
def GetCeilIndex(arr, T, l, r, key):
    while r - l > 1:
        m = l + (r - l) // 2
        if arr[T[m]] >= key:
            r = m
        else:
            l = m

    return r


def LongestIncreasingSubsequence(arr):
    n = len(arr)

    # 初始化为0
    tailIndices = [0 for i in range(n + 1)]

    # 初始化为-1
    prevIndices = [-1 for i in range(n + 1)]

    # 它将始终指向空位置
    length = 1
    for i in range(1, n):
        if arr[i] < arr[tailIndices[0]]:
            # 新的最小值
            tailIndices[0] = i

        elif arr[i] > arr[tailIndices[length - 1]]:
            # arr[i] 想要扩展
            # 最大子序列
            prevIndices[i] = tailIndices[length - 1]
            tailIndices[length] = i
            length += 1

        else:
            # arr[i] 想要成为
            # 未来子序列的潜在候选
            # 它将替换tailIndices中的ceil值
            pos = GetCeilIndex(arr, tailIndices, -1, length - 1, arr[i])

            prevIndices[i] = tailIndices[pos - 1]
            tailIndices[pos] = i

    res = []
    i = tailIndices[length - 1]
    while len(res) < length:
        res.append(arr[i])
        i = prevIndices[i]
    res.reverse()

    return res
英文:

As said above, your solution is in quadratic time complexity. However according to wikipedia the optimal solution can be found in O(n log(n)) time complexity.

I fixed a code I found here to get a function that runs in optimal time complexity and I tested that the results are consistent with your original function:

# Binary search
def GetCeilIndex(arr, T, l, r, key):
    while r - l &gt; 1:
        m = l + (r - l) // 2
        if arr[T[m]] &gt;= key:
            r = m
        else:
            l = m

    return r


def LongestIncreasingSubsequence(arr):
    n = len(arr)

    # Initialized with 0
    tailIndices = [0 for i in range(n + 1)]

    # Initialized with -1
    prevIndices = [-1 for i in range(n + 1)]

    # it will always point
    # to empty location
    length = 1
    for i in range(1, n):
        if arr[i] &lt; arr[tailIndices[0]]:
            # new smallest value
            tailIndices[0] = i

        elif arr[i] &gt; arr[tailIndices[length - 1]]:
            # arr[i] wants to extend
            # largest subsequence
            prevIndices[i] = tailIndices[length - 1]
            tailIndices[length] = i
            length += 1

        else:
            # arr[i] wants to be a
            # potential condidate of
            # future subsequence
            # It will replace ceil
            # value in tailIndices
            pos = GetCeilIndex(arr, tailIndices, -1, length - 1, arr[i])

            prevIndices[i] = tailIndices[pos - 1]
            tailIndices[pos] = i

    res = []
    i = tailIndices[length - 1]
    while len(res) &lt; length:
        res.append(arr[i])
        i = prevIndices[i]
    res.reverse()

    return res

huangapple
  • 本文由 发表于 2023年7月23日 21:56:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76748606.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定