避免在Python生成器中的`yield from`之后进行额外的`next`调用。

huangapple go评论66阅读模式
英文:

Avoiding extra `next` call after `yield from` in Python generator

问题

DumpData路径中,可以避免额外的next调用,通过对生成器的重构和异常处理进行一些修改。下面是修改后的代码:

def sample_gen() -> Generator[int | None, int, None]:
    out_value: int | None = None
    while True:
        try:
            in_value = yield out_value
        except DumpData:
            yield len(DUMP_DATA)
            out_value = None
        else:
            out_value = in_value

def main() -> None:
    g = sample_gen()
    next(g)  # Initialize
    assert g.send(1) == 1
    assert g.send(2) == 2

    # Okay let's dump the data
    num_data = g.throw(DumpData)
    data = tuple(next(g) for _ in range(num_data))
    assert data == DUMP_DATA

    # No extra `next` call is needed now
    assert g.send(3) == 3

通过将out_value的更新移到else分支中,我们可以避免在DumpData路径中需要额外的next调用,使代码更简洁。

英文:

Please see the below snippet, run with Python 3.10:

from collections.abc import Generator

DUMP_DATA = 5, 6, 7

class DumpData(Exception):
    """Exception used to indicate to yield from DUMP_DATA."""

def sample_gen() -> Generator[int | None, int, None]:
    out_value: int | None = None
    while True:
        try:
            in_value = yield out_value
        except DumpData:
            yield len(DUMP_DATA)
            yield from DUMP_DATA
            out_value = None
            continue
        out_value = in_value

My question pertains to the DumpData path where there is a yield from. After that yield from, there needs to be a next(g) call, to bring the generator back to the main yield statement so we can send:

def main() -> None:
    g = sample_gen()
    next(g)  # Initialize
    assert g.send(1) == 1
    assert g.send(2) == 2

    # Okay let's dump the data
    num_data = g.throw(DumpData)
    data = tuple(next(g) for _ in range(num_data))
    assert data == DUMP_DATA

    # How can one avoid this `next` call, before it works again?
    next(g)
    assert g.send(3) == 3

How can this extra next call be avoided?

答案1

得分: 3

当你直接使用 yield from 一个元组时,内置的 tuple_iteratorsample_gen 委托给它)在终止之前会处理一个额外的 "final value" yield。它没有 send 方法(与一般的生成器不同),并将最终值 None 返回给 sample_gen

行为:

yield from DUMP_DATA  # 等同于:
yield from tuple_iterator(DUMP_DATA)
def tuple_iterator(t):
    for item in t:
        yield item
    return None

你可以实现 tuple_iterator_generator,使用方式如下:

try:
    in_value = yield out_value
except DumpData:
    yield len(DUMP_DATA)
    in_value = yield from tuple_iterator_generator(DUMP_DATA)
out_value = in_value
def tuple_iterator_generator(t):
    in_value = None
    for item in t:
        in_value = yield item
    return in_value

或者如果你不想要那个行为,可以不使用 yield from

try:
    in_value = yield out_value
except DumpData:
    yield len(DUMP_DATA)
    for out_value in DUMP_DATA:
        in_value = yield out_value
out_value = in_value

查看 https://docs.python.org/3/whatsnew/3.3.html#pep-380-syntax-for-delegating-to-a-subgenerator 了解该行为的一个用例。

英文:

When you yield from a tuple directly, the built-in tuple_iterator (which sample_gen delegates to) handles an additional "final value" yield before it terminates. It does not have a send method (unlike generators in general) and returns a final value None to sample_gen.

The behavior:

yield from DUMP_DATA  # is equivalent to:
yield from tuple_iterator(DUMP_DATA)
def tuple_iterator(t):
    for item in t:
        yield item
    return None

You can implement tuple_iterator_generator, with usage:

try:
    in_value = yield out_value
except DumpData:
    yield len(DUMP_DATA)
    in_value = yield from tuple_iterator_generator(DUMP_DATA)
out_value = in_value
def tuple_iterator_generator(t):
    in_value = None
    for item in t:
        in_value = yield item
    return in_value

Or just not use yield from if you don't want that behavior:

try:
    in_value = yield out_value
except DumpData:
    yield len(DUMP_DATA)
    for out_value in DUMP_DATA:
        in_value = yield out_value
out_value = in_value

See https://docs.python.org/3/whatsnew/3.3.html#pep-380-syntax-for-delegating-to-a-subgenerator for a use case of that behavior.

答案2

得分: 2

这是为了删除一行代码而付出的大量努力。

请注意,如果 DUMP_DATA 是一个大型对象,或者不支持切片,因为切片会导致将 DUMP_DATA 的所有内容存储在内存中,然后才能产生,这将违反使用生成器的初衷。

如https://stackoverflow.com/a/26109157/15081390所述,yield from "在调用方和子生成器之间建立了一个透明的双向连接"。调用 next(g) 终止了这个连接,允许循环继续。在这样做的过程中,它接收(并丢弃)out_value,在 yield from DUMP_DATA 之后设置为 None。在 OP 的代码中,我们可以在定义变量 data 时用这个调用 next 替代最后一个调用:

data = tuple(next(g) for _ in range(num_data))
             ^^^^^^^

唯一需要的是能够检测到 DUMP_DATA 的末尾。如果 DUMP_DATA 是一个 Sequence(支持切片的对象),那么我们可以使用 yield from DUMP_DATA[:i-1] 来从 DUMP_DATA 中产生除最后一个元素之外的所有元素,最后一个元素将按正常方式产生(通过将 DUMP_DATA[-1] 赋给 out_value 并重新进入 sample_gen 的正常循环)。因此,在调用 main 中的最后一行时,生成器将正常响应。

from collections.abc import Generator

DUMP_DATA = 5, 6, 7

class DumpData(Exception):
    """用于指示要从 DUMP_DATA 中产生的异常"""

def sample_gen() -> Generator[int | None, int, None]:
    out_value: int | None = None
    while True:
        try:
            in_value = yield out_value
        except DumpData:
            # yield length, but not before storing in var i
            yield (i := len(DUMP_DATA))

            # if length is more than one item, then yield the first n - 1 elements
            if i > 1:
                yield from DUMP_DATA[:i-1]

            # in case DUMP_DATA is of length 0, don't try to yield it
            if i:
                out_value = DUMP_DATA[-1]

            continue

        out_value = in_value
def main() -> None:
    g = sample_gen()
    next(g)  # 初始化

    assert g.send(1) == 1
    assert g.send(2) == 2

    # 好的,让我们倒出数据
    num_data = g.throw(DumpData)

    # 最后一次调用 next(g) 退出了 yield from 状态
    data = tuple(next(g) for _ in range(num_data))

    assert data == DUMP_DATA

    # 无需调用 next(g)
    assert g.send(3) == 3

if __name__ == "__main__":
    main()  # 执行正常
英文:

This is a lot of effort to go to, to remove a single line of code.

Note that this is a bad solution if DUMP_DATA is a large object, or doesn't support slicing, because slicing will cause all of DUMP_DATA to be stored in memory, before being yielded, which defeats the point of using a generator.

As stated in https://stackoverflow.com/a/26109157/15081390, yield from "establishes a transparent bidirectional connection between the caller and the sub-generator". Calling next(g) terminates this and allows the loop to continue. In doing so it receives (and discards) out_value, which is set to None after yield from DUMP_DATA. In the OP's code, we can substitute this call to next for the last one when defining the variable data:

data = tuple(next(g) for _ in range(num_data))
             ^^^^^^^

All that is needed is be able to detect the end of DUMP_DATA. If DUMP_DATA is a Sequence (supports subscripting), then we can use yield from DUMP_DATA[:i-1] to yield from all but the last element, which will be yielded normally (by assigning DUMP_DATA[-1] to out_value and re-entering sample_gen's normal loop). Thus, when the final line in main is called, the generator will respond normally.

from collections.abc import Generator

DUMP_DATA = 5, 6, 7

class DumpData(Exception):
    """Exception used to indicate to yield from DUMP_DATA."""

def sample_gen() -> Generator[int | None, int, None]:
    out_value: int | None = None
    while True:
        try:
            in_value = yield out_value
        except DumpData:
            # yield length, but not before storing in var i
            yield (i := len(DUMP_DATA))

            # if length is more than one item, then yield the first n - 1 elements
            if i > 1:
                yield from DUMP_DATA[:i-1]

            # in case DUMP_DATA is of length 0, don't try to yield it
            if i:
                out_value = DUMP_DATA[-1]

            continue

        out_value = in_value
def main() -> None:
    g = sample_gen()
    next(g)  # Initialize

    assert g.send(1) == 1
    assert g.send(2) == 2

    # Okay let's dump the data
    num_data = g.throw(DumpData)

    # the last call of next(g) exits the yield from state
    data = tuple(next(g) for _ in range(num_data))

    assert data == DUMP_DATA

    # no need to call next(g)
    assert g.send(3) == 3

if __name__ == "__main__":
    main() # executes fine

答案3

得分: 1

你需要将你的内部生成器包装在一个具有 send 方法的“普通”生成器中。这将在这个级别上移除使用 yield from 的小优化,因为你又回到了迭代生成器并产生一个值的Python代码,但这是接受内部生成器在耗尽后由下一个迭代发送的值的唯一方式。

话虽如此,这很简单:

...

def inner_gen(gen):
    for item in gen:
        incoming = yield item
    return incoming

def sample_gen() -> Generator[int | None, int, None]:
    out_value: int | None = None
    while True:
        try:
            in_value = yield out_value
        except DumpData:
            yield len(DUMP_DATA)
            out_value = yield from inner_gen(DUMP_DATA)
            continue
        out_value = in_value
...

def main() -> None:
    g = sample_gen()
    next(g)  # 初始化
    assert g.send(1) == 1
    assert g.send(2) == 2

    # 好吧,让我们转储数据
    num_data = g.throw(DumpData)
    data = tuple(next(g) for _ in range(num_data))
    assert data == DUMP_DATA

    # 这个 `send` 值将被传递到 "inner_gen" 中,
    # 并用作 `yield from` 表达式的返回值。
    assert g.send(3) == 3

当然,这仅在你事先知道 yield from 将产生的元素数量,并在收到最后一个值后以及在 StopIteration 停止之前调用 send 时才有效。在生成器耗尽之后,其返回值(通常为None)已经被生成,并且没有办法在不向驱动生成器的代码产生值的情况下预先“请求一个值”。

英文:

You have to wrap your inner generator in a "plain" generator which has the send method.
This will remove, for this level, the small optimizations of using yield from, since you are back to Python code iterating a generator and yielding a value - but that is the only way to accept the value sent by the next iteration after the inner generator is eaxhausted.

That said, it is straightforward:

...

def inner_gen(gen):
    for item in gen:
        incoming = yield item
    return incoming

def sample_gen() -> Generator[int | None, int, None]:
    out_value: int | None = None
    while True:
        try:
            in_value = yield out_value
        except DumpData:
            yield len(DUMP_DATA)
            out_value = yield from inner_gen(DUMP_DATA)
            continue
        out_value = in_value
...
def main() -> None:
    g = sample_gen()
    next(g)  # Initialize
    assert g.send(1) == 1
    assert g.send(2) == 2

    # Okay let's dump the data
    num_data = g.throw(DumpData)
    data = tuple(next(g) for _ in range(num_data))
    assert data == DUMP_DATA

    # This `send` value will be taken into the "inner_gen" ,
    # and used as the return value of the `yield from` expression.
    assert g.send(3) == 3

Of course, this only works because you know before hand the number of elements the yield from will produce, and call send after receiving the last value and before it stops with StopIteration. After a generator is exhausted, its return value (usually None) is already produced, and there is no way to pre-emptively "ask for a value" from the code driving a generator without yielding to it.

huangapple
  • 本文由 发表于 2023年6月26日 10:12:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76553171.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定