Python: 为一对列表的列表编写线性__getitem__函数

huangapple go评论70阅读模式
英文:

Python: linear __getitem__ for a pair of list of lists

问题

Sure, here's the translated content you requested:

我现在有一个类它存储了一个列表的列表这些内部列表的长度不一样我使用以下代码可能不是最佳方法也许有点复杂使这个类可以进行下标访问

class MyClass:
    def __init__(self):
        #
        self.instructions = []

        # 用于演示
        self.instructions.append([0, 1, 2])
        self.instructions.append([3, 4, 5, 6])
        self.instructions.append([7, 8])

    def __getitem__(self, ind):
        if ind >= 0:
            iterator = self.instructions.__iter__()
            compare = int.__gt__
            inc = int.__add__
        else:
            iterator = reversed(self.instructions)
            compare = int.__le__
            inc = int.__sub__

        s = 0
        for tp in iterator:
            L = len(tp)
            if compare(inc(s, L), ind):
                return tp[ind-s]
            else:
                s = inc(s, L)
        else:
            raise IndexError('索引超出范围')

这个代码可以工作例如
>>> x = MyClass()
>>> x[5]
5
>>> x[-5]
4

现在我需要修改这个类使其可以存储两个列表的列表这两个列表是`instructions``annotations`,它们的长度相同但是`len(instructions[i])`不一定等于`len(annotations[i])`。

class NewClass:
    def __init__(self):
        #
        self.instructions = []
        self.annotations = []

        # 用于演示
        self.instructions.append([0, 1, 2])
        self.instructions.append([5, 6, 7, 8])
        self.instructions.append([12, 13])
        
        self.annotations.append([3, 4])
        self.annotations.append([9, 10, 11])
        self.annotations.append([14, 15, 16])

    def __getitem__(self, ind):
        pass

我想使这个类支持下标访问元素的顺序在`instructions`子列表和`annotations`子列表之间循环演示数据指示了下标访问的顺序我希望得到以下结果

>>> y = NewClass()
>>> y[9]
9
>>> y[-4]
13

有什么高效的方法可以实现这个
我可以编写一个解决方案交替迭代这两个子列表但我觉得这样离正确的解决方案有点远而且我还在寻找一种非循环的解决方案因为我处理的是很长的列表

Please note that code translation can be context-specific, and in some cases, it may require adjustments to fit the specific requirements of your codebase.

英文:

I currently have a class which stores a list of lists. The inner lists are not of the same length. I made the class subscriptable with the following code (possibly not the best way of doing this, and perhaps overly fancy).

class MyClass:
    def __init__(self):
        #
        self.instructions = []

        # for demo purposes
        self.instructions.append([0, 1, 2])
        self.instructions.append([3, 4, 5, 6])
        self.instructions.append([7, 8])

    def __getitem__(self, ind):
        if ind >= 0:
            iterator = self.instructions.__iter__()
            compare = int.__gt__
            inc = int.__add__
        else:
            iterator = reversed(self.instructions)
            compare = int.__le__
            inc = int.__sub__

        s = 0
        for tp in iterator:
            L = len(tp)
            if compare(inc(s, L), ind):
                return tp[ind-s]
            else:
                s = inc(s, L)
        else:
            raise IndexError('index out of range')

This works. For instance

>>> x = MyClass()
>>> x[5]
5
>>> x[-5]
4

Now, I need to modify the class so it now stores two list of lists. The two lists are instructions and annotations, and both have the same length. But len(instructions[i]) does not have to equal len(annotations[i]).

class NewClass:
    def __init__(self):
        #
        self.instructions = []
        self.annotations = []

        # for demo purposes
        self.instructions.append([0, 1, 2])
        self.instructions.append([5, 6, 7, 8])
        self.instructions.append([12, 13])
        
        self.annotations.append([3, 4])
        self.annotations.append([9, 10, 11])
        self.annotations.append([14, 15, 16])

    def __getitem__(self, ind):
        pass

I want to make this subscriptable, with the order of elements oscillating between the instructions sublists and the annotations sublists. The demo data indicates the subscripting order. I want

>>> y = NewClass()
>>> y[9]
9
>>> y[-4]
13

What's an efficient way of doing this?

I could write a solution where I alternatively iterate through the two sublists. But I feel like I am straying far from the correct solution. I am also looking for a non-for-loop solution as I am dealing with long lists.

答案1

得分: 3

以下是代码的翻译部分:

标准的存储成本和运行时成本之间的平衡,用于对多个数组的(非存储的)连接进行随机访问,是存储每个列表的开始处的偏移量表(即每个列表之前的所有列表的长度之和),然后在该表上使用二分搜索:

import itertools
import bisect

class Index:
    def __init__(self, ll):
        self.ll = ll
        self.oo = list(itertools.accumulate(map(len, ll), initial=0))

    def __getitem__(self, i):
        if i < 0:
            i += self.oo[-1]
        j = bisect.bisect(self.oo, i)
        if not 0 < j <= len(self.ll):
            raise IndexError
        return self.ll[j-1][i - self.oo[j-1]]

    def __iter__(self):
        return itertools.chain.from_iterable(self.ll)

# 示例:
i = Index(
  [[9, 1, 7],
   [3, 0],
   [],
   [4, 4, 4, 2]]
)
assert i[4] == 0 and i[8] == 2

j-1 是因为初始的 0 导致将 0 分配给插入点为 1。您可以省略 , initial=0(实际上也可以省略 self.oo 的最后一个元素),但会增加处理边缘/错误情况的代码复杂性。__iter__ 被提供,因为与依次使用整数进行索引相比,它在渐近意义上更,尽管通常会找到相同的子列表。

显然,将此扩展以支持两个列表的交错(长度相等)也很简单:求和交错长度,然后使用 divmod(j-1, 2) 来获得列表的索引和两个列表之间的选择(分别)。

英文:

The standard balance between storage cost and runtime cost for random access into the (non-stored) concatenation of several arrays is to store a table of the offsets of the beginning of each list (i.e., the sum of the length of every list before it) and use binary search on that table:

import itertools
import bisect
class Index:
def __init__(self,ll):
self.ll = ll
self.oo = list(itertools.accumulate(map(len,ll), initial=0))
def __getitem__(self, i):
if i &lt; 0:
i += self.oo[-1]
j = bisect.bisect(self.oo,i)
if not 0 &lt; j &lt;= len(self.ll):
raise IndexError
return self.ll[j-1][i-self.oo[j-1]]
def __iter__(self):
return itertools.chain.from_iterable(self.ll)
# Example:
i = Index(
[[9,1,7],
[3,0],
[],
[4,4,4,2]]
)
assert i[4]==0 and i[8]==2

The j-1 is because the initial 0 causes an i of 0 to be assigned the insertion point of 1. You can omit the ,initial=0 (and in fact the last element of self.oo as well) at the cost of more complicated code for the edge/error cases. __iter__ is provided because it is asymptotically faster than indexing with successive integers, each of which must be subjected to the binary search even though usually the same sublist will be found.

Obviously extending this to support the interleaving of two lists (of equal length) is trivial: sum the interleaved lengths and then use divmod(j-1,2) to obtain the index into a list and the selection between the two lists (respectively).

答案2

得分: 2

以下是您请求的内容的翻译:

在你的实现方式很不错,但我想分享一下我自己使用 chain.from_iterable 迭代的方式,因为基本上我们无论是从开头还是从末尾都在链接这些项目。

对于一个列表:

唯一需要解释的部分是 map(reversed, reversed(self.instructions))。我们不仅需要反转整个列表,还需要反转单独的子列表。

from itertools import chain

class MyClass:
    def __init__(self):
        self.instructions = [
            [0, 1, 2],
            [3, 4, 5, 6],
            [7, 8],
        ]

    def __getitem__(self, ind):
        if ind >= 0:
            chunks = self.instructions
            range_parameter = ind + 1
        else:
            chunks = map(reversed, reversed(self.instructions))
            range_parameter = abs(ind)

        iterator = chain.from_iterable(chunks)

        try:
            for _ in range(range_parameter):
                n = next(iterator)
        except StopIteration:
            raise IndexError("索引超出范围")

        return n

x = MyClass()
print(x[5])
print(x[-5])

对于两个列表:

既然你说我们需要振荡,zip 是正确的工具。当 ind 是正数时很简单。我们将它们进行压缩并使用 chain.from_iterable 两次,否则它会给我们单独的子列表。

如果 ind 是负数,我们需要在压缩之前进行两次 reversed()。一次用于外部列表,一次用于子列表。

from itertools import chain

class MyClass:
    def __init(self):
        self.instructions = [
            [0, 1, 2],
            [5, 6, 7, 8],
            [12, 13],
        ]

        self.annotations = [
            [3, 4],
            [9, 10, 11],
            [14, 15, 16],
        ]

    def __getitem__(self, ind):
        if ind >= 0:
            chunks = zip(self.instructions, self.annotations)
            range_parameter = ind + 1
        else:
            chunks = zip(
                map(reversed, reversed(self.annotations)),
                map(reversed, reversed(self.instructions)),
            )
            range_parameter = abs(ind)

        iterator = chain.from_iterable(chain.from_iterable(chunks))

        try:
            for _ in range(range_parameter):
                n = next(iterator)
        except StopIteration:
            raise IndexError("索引超出范围")

        return n

x = MyClass()
print(x[9])
print(x[-4])
英文:

While your implementation is nice, I would like to share my own way for iterating using chain.from_iterable. Because basically we're chaining the items whether from the beginning or at the end.

For one list:

The only part that needs explanation is map(reversed, reversed(self.instructions)). We not only need to reverse the whole list, but also the individual sublists.

from itertools import chain

class MyClass:
    def __init__(self):
        self.instructions = [
            [0, 1, 2],
            [3, 4, 5, 6],
            [7, 8],
        ]

    def __getitem__(self, ind):
        if ind &gt;= 0:
            chunks = self.instructions
            range_parameter = ind + 1
        else:
            chunks = map(reversed, reversed(self.instructions))
            range_parameter = abs(ind)

        iterator = chain.from_iterable(chunks)

        try:
            for _ in range(range_parameter):
                n = next(iterator)
        except StopIteration:
            raise IndexError(&quot;index out of range&quot;)

        return n

x = MyClass()
print(x[5])
print(x[-5])

For two lists:

Since you said we need to oscillate, zip is the right tool for that. When ind is positive it's straightforward. We zip them and use chain.from_iterable two times because otherwise it gives us individual sub-lists.

If ind is negative, we need two reversed() before zipping. One for outer lists, and one for sublists.

from itertools import chain

class MyClass:
    def __init__(self):
        self.instructions = [
            [0, 1, 2],
            [5, 6, 7, 8],
            [12, 13],
        ]

        self.annotations = [
            [3, 4],
            [9, 10, 11],
            [14, 15, 16],
        ]

    def __getitem__(self, ind):
        if ind &gt;= 0:
            chunks = zip(self.instructions, self.annotations)
            range_parameter = ind + 1
        else:
            chunks = zip(
                map(reversed, reversed(self.annotations)),
                map(reversed, reversed(self.instructions)),
            )
            range_parameter = abs(ind)

        iterator = chain.from_iterable(chain.from_iterable(chunks))

        try:
            for _ in range(range_parameter):
                n = next(iterator)
        except StopIteration:
            raise IndexError(&quot;index out of range&quot;)

        return n

x = MyClass()
print(x[9])
print(x[-4])

答案3

得分: 0

以下是您要翻译的代码部分:

这是我的想法适用于2个列表版本

首先我们定义一个在列表之间交替的函数

```python
from itertools import zip_longest

def alternate(*iterables: "list[list[Any]]") -> "Iterator[list[Any]]":
    for group in zip_longest(*iterables):
        for it in group:
            if it is not None:
                yield it

这样可以在任意数量的列表之间交替。

现在是类的部分:

class MyClass:
    def __init__(self):
        self.instructions = [
            [0, 1, 2],
            [5, 6, 7, 8],
            [12, 13],
        ]

        self.annotations = [
            [3, 4],
            [9, 10, 11],
            [14, 15, 16],
        ]
        
    def __len__(self):
        return sum(map(len, alternate(self.instructions, self.annotations)))
    
    def __getitem__(self, index: int):
        size = len(self)
        if index >= 0:
            if index >= size:
                raise IndexError(index)
        else:
            new_index = size + index
            if 0 <= new_index < size:
                index = new_index
            else:
                raise IndexError(index)
        for item in alternate(self.instructions, self.annotations):
            n = len(item)
            if 0 <= index < n:
                return item[index]
            index -= n
        raise RuntimeError("这不应该发生")

test = MyClass()
print(test[9])
print(test[-4])

我定义了一个长度(len)函数,它是子列表长度的总和,并在 __getitem__ 中使用它来首先确定输入的索引是否有效,对于负索引的情况,首先计算索引的正版本并检查其是否正确。

有了这些,因为子列表的大小不相同,只需循环遍历每个子列表,如果索引落在其范围内,则返回该值,否则减去该子列表的大小并继续下一个。

由于Python很酷,您还可以免费获得您的类是可迭代的(只需具有接受整数值的 __getitem__)和可逆的(还具有 __len__)。

>>> list(test)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
>>> list(reversed(test))
[16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

另外,您还可以使用 nth 配方:

from itertools import islice

def nth(iterable, n, default=None):
    "返回第n个项目或默认值"
    return next(islice(iterable, n, None), default)

然后,与 itertools.chain 一起,您可以将 getitem 的最后一个循环更改为 return nth(chain.from_iterable(alternate(self.instructions, self.annotations)), index),鉴于我们已经检查了索引是否在范围内。

英文:

here is my idea, for the 2 list version

first we define a funtion to alternate between lists

from itertools import zip_longest
def alternate(*iterables: &quot;list[list[Any]]&quot;) -&gt; &quot;Iterator[list[Any]]&quot;:
for group in zip_longest(*iterables):
for it in group:
if it is not None:
yield it

like that in can alternate between any number of your lists

now for the class

class MyClass:
def __init__(self):
self.instructions = [
[0, 1, 2],
[5, 6, 7, 8],
[12, 13],
]
self.annotations = [
[3, 4],
[9, 10, 11],
[14, 15, 16],
]
def __len__(self):
return sum(map(len,alternate(self.instructions, self.annotations)))
def __getitem__(self, index:int ):
size = len(self)
if index &gt;= 0:
if index &gt;= size:
raise IndexError(index)
else:
new_index = size + index
if 0 &lt;= new_index &lt; size:
index = new_index
else:
raise IndexError(index)
for item in alternate(self.instructions, self.annotations):
n = len(item)
if 0 &lt;= index &lt; n:
return item[index]
index -= n
raise RuntimeError(&quot;this should not happens&quot;)
test = MyClass()
print(test[9])
print(test[-4])

I defined a len which is the sum of all the length of the sublist, and use that in the __getitem__ to first determinate if the input index is valid or not, and for the negative case first calculate the positive version of the index and check if its correct.

With that done, and given that the sublist aren't of the same size, then simple going in a loop for each of the sublist and if the index fall in its range return that otherwise subtract the size of that sublist and go to the next.

and because python is cool like that you get for free that your class is iterable (just by having and __getitem__ that take interger values) and reversible (by also having a __len__)

&gt;&gt;&gt; list(test)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
&gt;&gt;&gt; list(reversed(test))
[16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

alternatively you can use the nth recipe

from itertools import islice
def nth(iterable, n, default=None):
&quot;Returns the nth item or a default value&quot;
return next(islice(iterable, n, None), default)   

(or from the more_itertools third party library)

and then alongside itertools.chain you can change the final loop in getitem for return nth(chain.from_iterable(alternate(self.instructions, self.annotations)), index) given that we already check that the index fall in range

答案4

得分: -1

这是我的方法:

import itertools

class NewClass:
    def __init__(self):
        #
        self.instructions = []
        self.annotations = []

        # for demo purposes
        self.instructions.append([0, 1, 2])
        self.instructions.append([5, 6, 7, 8])
        self.instructions.append([12, 13])
        
        self.annotations.append([3, 4])
        self.annotations.append([9, 10, 11])
        self.annotations.append([14, 15, 16])
        
    def __iter__(self):
        zipped = itertools.zip_longest(self.instructions, self.annotations, fillvalue=[])
        for sub_lists in zipped:
            yield from itertools.chain.from_iterable(sub_lists)

    def __getitem__(self, key):
        flat = list(self)
        return flat[key]

Notes

  • 我创建了__iter__方法,允许调用者像这样迭代对象:

    x = NewClass()
    for e in x:
        print(e)
    
  • __getitem__方法是建立在__iter__之上的。

  • 关于zipped数据:从概念上来看,你可以将其视为:

    [
        [[0, 1, 2], [3, 4]],          # 这是一个子列表
        [[5, 6, 7, 8], [9, 10, 11]],  # 另一个子列表
        ...
    ]
    
  • 表达式itertools.chain.from_iterable(sub_lists)基本上将子列表从[[0, 1, 2], [3, 4]]展平为[0, 1, 2, 3, 4]

  • 此解决方案适用于任意数量的列表,而不仅仅是2个。

更新

我修复了__getitem__以处理负索引,尽管性能会有一些损失。我有点懒,不想创建一个更高效的解决方案。

英文:

Here is my approach:

import itertools


class NewClass:
    def __init__(self):
        #
        self.instructions = []
        self.annotations = []

        # for demo purposes
        self.instructions.append([0, 1, 2])
        self.instructions.append([5, 6, 7, 8])
        self.instructions.append([12, 13])
        
        self.annotations.append([3, 4])
        self.annotations.append([9, 10, 11])
        self.annotations.append([14, 15, 16])
        
    def __iter__(self):
        zipped = itertools.zip_longest(self.instructions, self.annotations, fillvalue=[])
        for sub_lists in zipped:
            yield from itertools.chain.from_iterable(sub_lists)

    def __getitem__(self, key):
        flat = list(self)
        return flat[key]

Notes

  • I created the __iter__ method, which let the caller iterates through the object like this:

      x = NewClass()
    for e in x:
    print(e)
    
  • The __getitem__ is built upon __iter__

  • About the zipped data: Conceptually, you can view this as

      [
    [[0, 1, 2], [3, 4]],          # This is a sub_lists
    [[5, 6, 7, 8], [9, 10, 11]],  # a sub_lists
    ...
    ]
    
  • The expression itertools.chain.from_iterable(sub_lists) basically flatten a sub_lists from [[0, 1, 2], [3, 4]] to [0, 1, 2, 3, 4]

  • This solution works for an arbitrary number of lists, not just 2.

Update

I fixed up the __getitem__ to handle negative index at the cost of performance. I'm too lazy to create a solution which is more efficient.

答案5

得分: -1

If the two concatenated lists are the same size, you can use something like this:

div, mod = divmod(ind, 2)
if mod:
    return get_item(second_list, div)
else:
    return get_item(first_list, div)
英文:

If the two concatenated lists are the same size, you can use something like this:

div, mod = divmod(ind, 2)
if mod:
return get_item(second_list, div)
else:
return get_item(first_list, div)

huangapple
  • 本文由 发表于 2023年4月11日 04:17:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75980399.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定