英文:
Avoiding extra `next` call after `yield from` in Python generator
问题
在DumpData
路径中,可以避免额外的next
调用,通过对生成器的重构和异常处理进行一些修改。下面是修改后的代码:
def sample_gen() -> Generator[int | None, int, None]:
out_value: int | None = None
while True:
try:
in_value = yield out_value
except DumpData:
yield len(DUMP_DATA)
out_value = None
else:
out_value = in_value
def main() -> None:
g = sample_gen()
next(g) # Initialize
assert g.send(1) == 1
assert g.send(2) == 2
# Okay let's dump the data
num_data = g.throw(DumpData)
data = tuple(next(g) for _ in range(num_data))
assert data == DUMP_DATA
# No extra `next` call is needed now
assert g.send(3) == 3
通过将out_value
的更新移到else
分支中,我们可以避免在DumpData
路径中需要额外的next
调用,使代码更简洁。
英文:
Please see the below snippet, run with Python 3.10:
from collections.abc import Generator
DUMP_DATA = 5, 6, 7
class DumpData(Exception):
"""Exception used to indicate to yield from DUMP_DATA."""
def sample_gen() -> Generator[int | None, int, None]:
out_value: int | None = None
while True:
try:
in_value = yield out_value
except DumpData:
yield len(DUMP_DATA)
yield from DUMP_DATA
out_value = None
continue
out_value = in_value
My question pertains to the DumpData
path where there is a yield from
. After that yield from
, there needs to be a next(g)
call, to bring the generator
back to the main yield
statement so we can send
:
def main() -> None:
g = sample_gen()
next(g) # Initialize
assert g.send(1) == 1
assert g.send(2) == 2
# Okay let's dump the data
num_data = g.throw(DumpData)
data = tuple(next(g) for _ in range(num_data))
assert data == DUMP_DATA
# How can one avoid this `next` call, before it works again?
next(g)
assert g.send(3) == 3
How can this extra next
call be avoided?
答案1
得分: 3
当你直接使用 yield from
一个元组时,内置的 tuple_iterator
(sample_gen
委托给它)在终止之前会处理一个额外的 "final value" yield
。它没有 send
方法(与一般的生成器不同),并将最终值 None
返回给 sample_gen
。
行为:
yield from DUMP_DATA # 等同于:
yield from tuple_iterator(DUMP_DATA)
def tuple_iterator(t):
for item in t:
yield item
return None
你可以实现 tuple_iterator_generator
,使用方式如下:
try:
in_value = yield out_value
except DumpData:
yield len(DUMP_DATA)
in_value = yield from tuple_iterator_generator(DUMP_DATA)
out_value = in_value
def tuple_iterator_generator(t):
in_value = None
for item in t:
in_value = yield item
return in_value
或者如果你不想要那个行为,可以不使用 yield from
:
try:
in_value = yield out_value
except DumpData:
yield len(DUMP_DATA)
for out_value in DUMP_DATA:
in_value = yield out_value
out_value = in_value
查看 https://docs.python.org/3/whatsnew/3.3.html#pep-380-syntax-for-delegating-to-a-subgenerator 了解该行为的一个用例。
英文:
When you yield from
a tuple directly, the built-in tuple_iterator
(which sample_gen
delegates to) handles an additional "final value" yield
before it terminates. It does not have a send
method (unlike generators in general) and returns a final value None
to sample_gen
.
The behavior:
yield from DUMP_DATA # is equivalent to:
yield from tuple_iterator(DUMP_DATA)
def tuple_iterator(t):
for item in t:
yield item
return None
You can implement tuple_iterator_generator
, with usage:
try:
in_value = yield out_value
except DumpData:
yield len(DUMP_DATA)
in_value = yield from tuple_iterator_generator(DUMP_DATA)
out_value = in_value
def tuple_iterator_generator(t):
in_value = None
for item in t:
in_value = yield item
return in_value
Or just not use yield from
if you don't want that behavior:
try:
in_value = yield out_value
except DumpData:
yield len(DUMP_DATA)
for out_value in DUMP_DATA:
in_value = yield out_value
out_value = in_value
See https://docs.python.org/3/whatsnew/3.3.html#pep-380-syntax-for-delegating-to-a-subgenerator for a use case of that behavior.
答案2
得分: 2
这是为了删除一行代码而付出的大量努力。
请注意,如果 DUMP_DATA
是一个大型对象,或者不支持切片,因为切片会导致将 DUMP_DATA
的所有内容存储在内存中,然后才能产生,这将违反使用生成器的初衷。
如https://stackoverflow.com/a/26109157/15081390所述,yield from
"在调用方和子生成器之间建立了一个透明的双向连接"。调用 next(g)
终止了这个连接,允许循环继续。在这样做的过程中,它接收(并丢弃)out_value
,在 yield from DUMP_DATA
之后设置为 None
。在 OP 的代码中,我们可以在定义变量 data
时用这个调用 next
替代最后一个调用:
data = tuple(next(g) for _ in range(num_data))
^^^^^^^
唯一需要的是能够检测到 DUMP_DATA
的末尾。如果 DUMP_DATA
是一个 Sequence
(支持切片的对象),那么我们可以使用 yield from DUMP_DATA[:i-1]
来从 DUMP_DATA
中产生除最后一个元素之外的所有元素,最后一个元素将按正常方式产生(通过将 DUMP_DATA[-1]
赋给 out_value
并重新进入 sample_gen
的正常循环)。因此,在调用 main
中的最后一行时,生成器将正常响应。
from collections.abc import Generator
DUMP_DATA = 5, 6, 7
class DumpData(Exception):
"""用于指示要从 DUMP_DATA 中产生的异常。"""
def sample_gen() -> Generator[int | None, int, None]:
out_value: int | None = None
while True:
try:
in_value = yield out_value
except DumpData:
# yield length, but not before storing in var i
yield (i := len(DUMP_DATA))
# if length is more than one item, then yield the first n - 1 elements
if i > 1:
yield from DUMP_DATA[:i-1]
# in case DUMP_DATA is of length 0, don't try to yield it
if i:
out_value = DUMP_DATA[-1]
continue
out_value = in_value
def main() -> None:
g = sample_gen()
next(g) # 初始化
assert g.send(1) == 1
assert g.send(2) == 2
# 好的,让我们倒出数据
num_data = g.throw(DumpData)
# 最后一次调用 next(g) 退出了 yield from 状态
data = tuple(next(g) for _ in range(num_data))
assert data == DUMP_DATA
# 无需调用 next(g)
assert g.send(3) == 3
if __name__ == "__main__":
main() # 执行正常
英文:
This is a lot of effort to go to, to remove a single line of code.
Note that this is a bad solution if DUMP_DATA
is a large object, or doesn't support slicing, because slicing will cause all of DUMP_DATA
to be stored in memory, before being yielded, which defeats the point of using a generator.
As stated in https://stackoverflow.com/a/26109157/15081390, yield from
"establishes a transparent bidirectional connection between the caller and the sub-generator". Calling next(g)
terminates this and allows the loop to continue. In doing so it receives (and discards) out_value
, which is set to None
after yield from DUMP_DATA
. In the OP's code, we can substitute this call to next
for the last one when defining the variable data
:
data = tuple(next(g) for _ in range(num_data))
^^^^^^^
All that is needed is be able to detect the end of DUMP_DATA
. If DUMP_DATA is a Sequence
(supports subscripting), then we can use yield from DUMP_DATA[:i-1]
to yield from
all but the last element, which will be yielded normally (by assigning DUMP_DATA[-1]
to out_value
and re-entering sample_gen
's normal loop). Thus, when the final line in main
is called, the generator will respond normally.
from collections.abc import Generator
DUMP_DATA = 5, 6, 7
class DumpData(Exception):
"""Exception used to indicate to yield from DUMP_DATA."""
def sample_gen() -> Generator[int | None, int, None]:
out_value: int | None = None
while True:
try:
in_value = yield out_value
except DumpData:
# yield length, but not before storing in var i
yield (i := len(DUMP_DATA))
# if length is more than one item, then yield the first n - 1 elements
if i > 1:
yield from DUMP_DATA[:i-1]
# in case DUMP_DATA is of length 0, don't try to yield it
if i:
out_value = DUMP_DATA[-1]
continue
out_value = in_value
def main() -> None:
g = sample_gen()
next(g) # Initialize
assert g.send(1) == 1
assert g.send(2) == 2
# Okay let's dump the data
num_data = g.throw(DumpData)
# the last call of next(g) exits the yield from state
data = tuple(next(g) for _ in range(num_data))
assert data == DUMP_DATA
# no need to call next(g)
assert g.send(3) == 3
if __name__ == "__main__":
main() # executes fine
答案3
得分: 1
你需要将你的内部生成器包装在一个具有 send
方法的“普通”生成器中。这将在这个级别上移除使用 yield from
的小优化,因为你又回到了迭代生成器并产生一个值的Python代码,但这是接受内部生成器在耗尽后由下一个迭代发送的值的唯一方式。
话虽如此,这很简单:
...
def inner_gen(gen):
for item in gen:
incoming = yield item
return incoming
def sample_gen() -> Generator[int | None, int, None]:
out_value: int | None = None
while True:
try:
in_value = yield out_value
except DumpData:
yield len(DUMP_DATA)
out_value = yield from inner_gen(DUMP_DATA)
continue
out_value = in_value
...
def main() -> None:
g = sample_gen()
next(g) # 初始化
assert g.send(1) == 1
assert g.send(2) == 2
# 好吧,让我们转储数据
num_data = g.throw(DumpData)
data = tuple(next(g) for _ in range(num_data))
assert data == DUMP_DATA
# 这个 `send` 值将被传递到 "inner_gen" 中,
# 并用作 `yield from` 表达式的返回值。
assert g.send(3) == 3
当然,这仅在你事先知道 yield from
将产生的元素数量,并在收到最后一个值后以及在 StopIteration
停止之前调用 send
时才有效。在生成器耗尽之后,其返回值(通常为None)已经被生成,并且没有办法在不向驱动生成器的代码产生值的情况下预先“请求一个值”。
英文:
You have to wrap your inner generator in a "plain" generator which has the send
method.
This will remove, for this level, the small optimizations of using yield
from, since you are back to Python code iterating a generator and yielding a value - but that is the only way to accept the value sent
by the next iteration after the inner generator is eaxhausted.
That said, it is straightforward:
...
def inner_gen(gen):
for item in gen:
incoming = yield item
return incoming
def sample_gen() -> Generator[int | None, int, None]:
out_value: int | None = None
while True:
try:
in_value = yield out_value
except DumpData:
yield len(DUMP_DATA)
out_value = yield from inner_gen(DUMP_DATA)
continue
out_value = in_value
...
def main() -> None:
g = sample_gen()
next(g) # Initialize
assert g.send(1) == 1
assert g.send(2) == 2
# Okay let's dump the data
num_data = g.throw(DumpData)
data = tuple(next(g) for _ in range(num_data))
assert data == DUMP_DATA
# This `send` value will be taken into the "inner_gen" ,
# and used as the return value of the `yield from` expression.
assert g.send(3) == 3
Of course, this only works because you know before hand the number of elements the yield from
will produce, and call send
after receiving the last value and before it stops with StopIteration
. After a generator is exhausted, its return value (usually None) is already produced, and there is no way to pre-emptively "ask for a value" from the code driving a generator without yielding to it.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论