Python: 在不改变原始实例的情况下链接迭代器

huangapple go评论68阅读模式
英文:

Python: Chaining iterators without changing the original instance

问题

https://stackoverflow.com/questions/3211041/how-to-join-two-generators-or-other-iterables-in-python

但我在寻找一个保留迭代器的原始实例的解决方案?

可以提供一个替代 keep_instance_chain_left() 的方法吗?

解决方案:

iterator_chained = itertools.chain(iterator1, iterator2)

只提供了一个新的迭代器实例:

assert iterator_chained is not iterator2
英文:

This questions goes in the direction of this question:
https://stackoverflow.com/questions/3211041/how-to-join-two-generators-or-other-iterables-in-python

But I'm searching for a solution that keeps the original instance of an iterator?

Something that does the following:

iterator1=range(10)
iterator2=range(10)

iterator_chained=keep_instance_chain_left(iterator1,iterator2)
assert iterator2 is iterator_chained #!!!!

The iterator1 should be extendleft the original iterator2.

Can anybody give an replacement for keep_instance_chain_left() ?

The solution:

iterator_chained=itertools.chain(iterator1,iterator2)

Delivers just a new instance of an iterator:

assert iterator_chained is not iterator2 

答案1

得分: 0

感谢您的评论和想法,我创建了以下适合我的用例的解决方案。

class MutableIterator():

    __slots__=('_chain','_iterator','_depth')

    def __init__(self,*iterators):
        self._chain=itertools.chain
        self._depth=0
        s=len(iterators)
        if s==0:
            self._iterator=iter(()) #empty iterator
        elif s==1:
            self._iterator =iter(iterators[0])
        else:
            self._iterator=self._chain(*iterators)

    def __next__(self):
        yield next(self._iterator)

    def __iter__(self):
        try:
            while 1:
                yield next(self._iterator)
        except StopIteration:
            pass

    def append(self,iterator):
        if self._depth>20000: 
        # maximum depth of c-level recursions possible we  must consume here the iterator
            self._iterator=self._chain(list(self._iterator),iter(iterator))
            self._depth=0
        else:
            self._iterator = self._chain(self._iterator, iter(iterator))
            self._depth +=1

    def appendleft(self,iterator):
        if self._depth>20000: 
        # maximum depth of c-level recursions possible we  must consume here the iterator
            self._iterator = self._chain(iterator, list(self._iterator))
            self._depth=0
        else:
            self._iterator = self._chain(iterator, self._iterator)
            self._depth +=1

例如,它提供以下输出:

a=[[1,2,3],[4,5,6]]
my_iterator=MutableIterator(a)
for i in my_iterator:
    if type(i) is list:
        my_iterator.appendleft(iter(i))
    print(i,end=', ')

输出为:

[1, 2, 3], 1, 2, 3, [4, 5, 6], 4, 5, 6, 

尽管它有效,但我并不百分之百满意。

  1. 对于这个问题,最好有一个内置解决方案。
  2. 实际上,在__iter__()中,我将for循环替换为了while循环,从我的角度来看这不是很好。
  3. 不时地必须在内部消耗迭代器,以避免Python的C级递归错误(我必须指出,在我的用例中,深度不会超过1000)。但使用这个代码,深度是无限的。

我对比了一下基于collections.deque的解决方案:

class MutableIteratorDeque():

    __slots__=('_iterator')

    def __init__(self,*iterators):
        s=len(iterators)
        if s==0:
            self._iterator=deque(()) #empty iterator
        elif s==1:
            self._iterator =deque((iter(iterators[0]),))
        else:
            self._iterator=deque((iter(i.__iter__()) for i in iterators))

    def __next__(self):
        while 1:
            try:
                yield next(self._iterator[0])
            except StopIteration:
                try:
                    self._iterator.popleft()
                except IndexError:
                    raise StopIteration

    def __iter__(self):

            while 1:
                try:
                    yield next(self._iterator[0])
                except StopIteration:
                    self._iterator.popleft()
                except IndexError:
                    break

    def append(self,iterator):
        self._iterator.append(iter(iterator))

    def appendleft(self,iterator):
        self._iterator.appendleft(iter(iterator))

以下是我的测试函数:

item_number=10000000
deeplist=[]
sub=deeplist
for i in range(item_number):
    sub.append([1])
    sub=sub[-1]

flatlist=list(range(item_number))

def iter1():
    global deeplist
    my_iterator=MutableIterator(deeplist)
    for i in my_iterator:
        if type(i is list):
            my_iterator.appendleft(iter(i))

def iter1b():
    global flatlist
    my_iterator=MutableIterator(flatlist)
    for i in my_iterator:
        if type(i is list):
            my_iterator.appendleft(iter(i))


def iter2():
    global deeplist
    my_iterator=MutableIteratorDeque(deeplist)
    for i in my_iterator:
        if type(i is list):
            my_iterator.appendleft(iter(i))

def iter2b():
    global flatlist
    my_iterator=MutableIteratorDeque(flatlist)
    for i in my_iterator:
        if type(i is list):
            my_iterator.appendleft(iter(i))

print('Used size of list: %i' % item_number)
print('MutableIterator via chain() deeplist: %fs' % timeit.timeit(iter1,number=1))
print('MutableIterator via deque() deeplist: %fs' % timeit.timeit(iter2,number=1))

print('MutableIterator via chain() flatlist: %fs' % timeit.timeit(iter1b,number=1))
print('MutableIterator via deque() flatlist: %fs' % timeit.timeit(iter2b,number=1))

我得到了以下基于Python 3.9(64位)的结果:

Used size of list: 10000000
MutableIterator via chain() deeplist: 4.338570s
MutableIterator via deque() deeplist: 4.664340s
MutableIterator via chain() flatlist: 0.802791s
MutableIterator via deque() flatlist: 0.912932s

这意味着,当结构更深时,迭代时间确实会增加。在这里,itertools.chain的性能稍微优于基于collections.deque的解决方案,即使必须不时地在内部消耗迭代器以避免递归错误。

但我们也可以说,两种解决方案之间的差异并不是很大。

根据@Kelly Bundy的建议,我们可以看到在某些嵌套结构下,创建chain()对象似乎太“昂贵”,因此总体上可能建议使用deque解决方案:

deeplist = []
for i in range(10000):
    deeplist = [deeplist]
deeplist += [0] * 10000


print('MutableIterator via chain() deeplist: %fs' % timeit.timeit(iter1,number=1))
print('MutableIterator via deque() deeplist: %fs' % timeit.timeit(iter2,number=1))

结果为:

MutableIterator via chain() deeplist: 

<details>
<summary>英文:</summary>

Thank you for your comments and ideas I created the following solution witch fits to my use case.

    class MutableIterator():
    
        __slots__=(&#39;_chain&#39;,&#39;_iterator&#39;,&#39;_depth&#39;)
    
        def __init__(self,*iterators):
            self._chain=itertools.chain
            self._depth=0
            s=len(iterators)
            if s==0:
                self._iterator=iter(()) #empty iterator
            elif s==1:
                self._iterator =iter(iterators[0])
            else:
                self._iterator=self._chain(*iterators)
    
        def __next__(self):
            yield next(self._iterator)
    
        def __iter__(self):
            try:
                while 1:
                    yield next(self._iterator)
            except StopIteration:
                pass
    
        def append(self,iterator):
            if self._depth&gt;20000: 
            # maximum depth of c-level recursions possible we  must consume here the iterator
                self._iterator=self._chain(list(self._iterator),iter(iterator))
                self._depth=0
            else:
                self._iterator = self._chain(self._iterator, iter(iterator))
                self._depth +=1
    
        def appendleft(self,iterator):
            if self._depth&gt;20000: 
            # maximum depth of c-level recursions possible we  must consume here the iterator
                self._iterator = self._chain(iterator, list(self._iterator))
                self._depth=0
            else:
                self._iterator = self._chain(iterator, self._iterator)
                self._depth +=1

   

E.g. it delivers the output:

    a=[[1,2,3],[4,5,6]]
    my_iterator=MutableIterator(a)
    for i in my_iterator:
        if type(i) is list:
            my_iterator.appendleft(iter(i))
        print(i,end=&#39;, &#39;)


    [1, 2, 3], 1, 2, 3, [4, 5, 6], 4, 5, 6, 

Even that it works I&#39;m not 100% satisfied.

 1. It would be nice to have a build-in solution for this problem
 2. In fact in `__iter__()` I replace the for loop by a while loop which is not so nice from my point of few.
 3. From time to time the iterator must be consumed internally to avoid recursion erros in c-level of python (I must remark that in my use case depth &gt; 1000 will not come up). But with this code the depth is unlimited.

I made a test against a solution based on `collections.deque`:

    class MutableIteratorDeque():
    
        __slots__=(&#39;_iterator&#39;)
    
        def __init__(self,*iterators):
            s=len(iterators)
            if s==0:
                self._iterator=deque(()) #empty iterator
            elif s==1:
                self._iterator =deque((iter(iterators[0]),))
            else:
                self._iterator=deque((iter(i.__iter__()) for i in iterators))
    
        def __next__(self):
            while 1:
                try:
                    yield next(self._iterator[0])
                except StopIteration:
                    try:
                        self._iterator.popleft()
                    except IndexError:
                        raise StopIteration
    
        def __iter__(self):
    
                while 1:
                    try:
                        yield next(self._iterator[0])
                    except StopIteration:
                        self._iterator.popleft()
                    except IndexError:
                        break
    
    
        def append(self,iterator):
            self._iterator.append(iter(iterator))
    
        def appendleft(self,iterator):
            self._iterator.appendleft(iter(iterator))

Here my testing functions:

    item_number=10000000
    deeplist=[]
    sub=deeplist
    for i in range(item_number):
        sub.append([1])
        sub=sub[-1]
    
    flatlist=list(range(item_number))
    
    def iter1():
        global deeplist
        my_iterator=MutableIterator(deeplist)
        for i in my_iterator:
            if type(i) is list:
                my_iterator.appendleft(iter(i))
    
    def iter1b():
        global flatlist
        my_iterator=MutableIterator(flatlist)
        for i in my_iterator:
            if type(i) is list:
                my_iterator.appendleft(iter(i))
    
    
    def iter2():
        global deeplist
        my_iterator=MutableIteratorDeque(deeplist)
        for i in my_iterator:
            if type(i) is list:
                my_iterator.appendleft(iter(i))
    
    def iter2b():
        global flatlist
        my_iterator=MutableIteratorDeque(flatlist)
        for i in my_iterator:
            if type(i) is list:
                my_iterator.appendleft(iter(i))

    print(&#39;Used size of list: %i&#39;%item_number)
    print(&#39;MutableIterator via chain() deeplist: %fs&#39;%timeit.timeit(iter1,number=1))
    print(&#39;MutableIterator via deque() deeplist: %fs&#39;%timeit.timeit(iter2,number=1))
    
    print(&#39;MutableIterator via chain() flatlist: %fs&#39;%timeit.timeit(iter1b,number=1))
    print(&#39;MutableIterator via deque() flatlist: %fs&#39;%timeit.timeit(iter2b,number=1))


I got following results based on Python 3.9 (64Bit):
    
    Used size of list: 10000000
    MutableIterator via chain() deeplist: 4.338570s
    MutableIterator via deque() deeplist: 4.664340s
    MutableIterator via chain() flatlist: 0.802791s
    MutableIterator via deque() flatlist: 0.912932s

This means yes for sure the iteration time increases if we have deeper structures. Here `itertools.chain` behaves a bit faster as a solution based on `collections.deque`even that the iterator must be internally consumed from time to time to avoid recursion erros.

But we can also say the difference in between the two solutions is not really large. 

After the input from @Kelly Bundy we can see that there are nested structures where the creation of the `chain()`-objects seams to be to &quot;costly&quot;, so overall it might be recommend to use the `deque` solution:

    deeplist = []
    for i in range(10000):
        deeplist = [deeplist]
    deeplist += [0] * 10000
    
    
    print(&#39;MutableIterator via chain() deeplist: %fs&#39;%timeit.timeit(iter1,number=1))
    print(&#39;MutableIterator via deque() deeplist: %fs&#39;%timeit.timeit(iter2,number=1))

Result:

    MutableIterator via chain() deeplist: 0.907174s
    MutableIterator via deque() deeplist: 0.004194s


</details>



# 答案2
**得分**: 0

我想提供以下基于`collections.deque`的改进解决方案。该解决方案在`__iter__()`中使用了一个内部的for循环,可以提高在迭代次数不变的情况下的速度。

此外,我现在已经调整了`__next__()`方法,使其应该能正常工作(之前我没有重点关注它)。应该清楚,该类模拟了一个迭代器,而不是一个真正的迭代器。

```python
from collections import deque

class MutableIteratorDeque():

    __slots__=('__iterator')

    def __init__(self, *iterators):
        self.__iterator = deque((iter(i) for i in iterators))

    def __next__(self):
        iterator = self.__iterator
        while iterator:
            try:
                return next(iterator[0])
            except StopIteration:
                iterator.popleft()
        raise StopIteration

    def __iter__(self):
        iterator = self.__iterator  # 本地化
        while iterator:
            it = iterator[0]
            for i in it:
                yield i
                if it is not iterator[0]:
                    break
            else:
                iterator.popleft()

    def append(self, iterator):
        self.__iterator.append(iter(iterator))

    def appendleft(self, iterator):
        self.__iterator.appendleft(iter(iterator))

这段代码与“链式”解决方案相比,提供了以下速度结果(请参考我的先前答案):

对于列表大小为10000000:

MutableIterator通过chain()嵌套列表:4.588935秒
MutableIterator通过deque()嵌套列表:3.647782秒
MutableIterator通过chain()扁平列表:0.806678秒
MutableIterator通过deque()扁平列表:0.756402秒

最后一个嵌套结构:

MutableIterator通过chain()嵌套列表:0.889283秒
MutableIterator通过deque()嵌套列表:0.003150秒

对我来说,这是我目前找到的最佳解决方案。我仍然希望能看到一种内置的解决方案(也许将来会有)。再次感谢您的帮助和讨论。

英文:

I like to add the following improved solution based on collections.deque. The solution uses in __iter__() an internal for loop which improves in case of unchanged iterations the speed.

Beside this I have now adapted the __next__() method so that it should work now (I did not focus on it before). It should be clear that the class mocks an iterator and it is not a real iterator.

from collections import deque

class MutableIteratorDeque():

    __slots__=(&#39;_iterator&#39;)

    def __init__(self,*iterators):
        self._iterator=deque((iter(i) for i in iterators))

    def __next__(self):
        iterator=self._iterator
        while iterator:
            try:
                return next(iterator[0])
            except StopIteration:
                iterator.popleft()
        raise StopIteration

    def __iter__(self):
        iterator=self._iterator # make local
        while iterator:
            it=iterator[0]
            for i in it:
                yield i
                if it is not iterator[0]:
                    break
            else:
                iterator.popleft()

    def append(self,iterator):
        self._iterator.append(iter(iterator))

    def appendleft(self,iterator):
        self._iterator.appendleft(iter(iterator))

This code delivers the following speed results compared to the "chained" solution (see my previous answer):

Used size of list: 10000000
MutableIterator via chain() deeplist: 4.588935s
MutableIterator via deque() deeplist: 3.647782s
MutableIterator via chain() flatlist: 0.806678s
MutableIterator via deque() flatlist: 0.756402s

And with the last nested structure:

MutableIterator via chain() deeplist: 0.889283s
MutableIterator via deque() deeplist: 0.003150s   

For me this is the best solution I found at the moment. I still would like to see a buildin-solution (maybe in future).

Thank you again for your help and the discussion.

答案3

得分: -1

这个示例使用两个迭代器并跟踪它们的状态。当迭代器耗尽时,它们停止。

class ChainedIterator:
    def __init__(self, iter1, iter2):
        self.iter1 = iter1
        self.iter2 = iter2
        self.current_iter = iter1

    def __iter__(self):
        for it in self.iterators:
            yield from it

    def __next__(self):
        try:
            return next(self.current_iter)
        except StopIteration:
            if self.current_iter is self.iter1:
                self.current_iter = self.iter2
                return next(self.current_iter)
            else:
                raise StopIteration

iterator1 = range(10)
iterator2 = range(10)

iterator_chained = ChainedIterator(iterator2, iterator1)
assert iterator_chained.current_iter is iterator2

(Note: The code is provided without translation as per your request.)

英文:

this example takes two iterators and keeps tracks of both of them. When the iterators are exhausted, they stop.

class ChainedIterator:
	def __init__(self, iter1, iter2):
		self.iter1 = iter1
		self.iter2 = iter2
		self.current_iter = iter1

	def __iter__(self):
        for it in self.iterators:
            yield from it 		

	def __next__(self):
		try:
			return next(self.current_iter)
		except StopIteration:
			if self.current_iter is self.iter1:
				self.current_iter = self.iter2
				return next(self.current_iter)
			else:
				raise StopIteration


iterator1 = range(10)
iterator2 = range(10)

iterator_chained = ChainedIterator(iterator2, iterator1)
assert iterator_chained.current_iter is iterator2

答案4

得分: -2

抱歉,直接在Python中无法实现你所寻求的目标。一旦迭代器耗尽,就无法将其“倒带”到原始状态。

但是,有一种解决方法涉及创建一个自定义迭代器类,该类跟踪原始迭代器和由itertools.chain()创建的新迭代器。以下是一个示例实现:

import itertools

class ChainedIterator:
    def __init__(self, iterable1, iterable2):
        self.original_iterable = iterable1
        self.chained_iterable = itertools.chain(iterable1, iterable2)
        self.current_iterable = self.original_iterable

    def __iter__(self):
        return self

    def __next__(self):
        try:
            return next(self.current_iterable)
        except StopIteration:
            if self.current_iterable is self.original_iterable:
                self.current_iterable = self.chained_iterable
                return next(self.current_iterable)
            else:
                raise StopIteration

    def __getattr__(self, attr):
        return getattr(self.current_iterable, attr)

使用此实现,你可以创建一个ChainedIterator实例,该实例包装在原始迭代器周围:

iterator1 = range(10)
iterator2 = range(10)
iterator_chained = ChainedIterator(iterator1, iterator2)

现在,iterator_chained的行为与常规迭代器相同,唯一的区别是一旦原始迭代器耗尽,它将切换到链接迭代器:

assert iterator_chained is not iterator2  # 这将通过
for i in iterator_chained:
   print(i)

这将输出:

0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
英文:

Unfortunately, it's not possible to achieve what you're looking for directly in Python. Once an iterator is exhausted, there's no way to "rewind" it to its original state.

However, there is a workaround that involves creating a custom iterator class that keeps track of the original iterator and the new iterator created by itertools.chain(). Here's an example implementation:

import itertools

class ChainedIterator:
    def __init__(self, iterable1, iterable2):
        self.original_iterable = iterable1
        self.chained_iterable = itertools.chain(iterable1, iterable2)
        self.current_iterable = self.original_iterable

    def __iter__(self):
        return self

    def __next__(self):
        try:
            return next(self.current_iterable)
        except StopIteration:
            if self.current_iterable is self.original_iterable:
                self.current_iterable = self.chained_iterable
                return next(self.current_iterable)
            else:
                raise StopIteration

    def __getattr__(self, attr):
        return getattr(self.current_iterable, attr)

With this implementation, you can create a ChainedIterator instance that wraps around your original iterators:

iterator1 = range(10)
iterator2 = range(10)
iterator_chained = ChainedIterator(iterator1, iterator2)

Now, iterator_chained behaves just like a regular iterator, except that it switches to the chained iterator once the original iterator is exhausted:

assert iterator_chained is not iterator2  # This will pass
for i in iterator_chained:
   print(i)

This will output:

0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9

huangapple
  • 本文由 发表于 2023年3月31日 16:06:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/75896214.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定