英文:
In the Hypothesis library for Python, why does the text() strategy cause custom strategies to retry?
问题
我使用 composite
创建了一个自定义策略,该策略内部使用了 text
策略。
在调试另一个错误(FailedHealthCheck.data_too_large
)时,我意识到从 text
策略中提取数据会导致我的复合策略被调用的频率大约是预期的两倍。
我能够重现以下的最小示例:
@hypothesis.strategies.composite
def my_custom_strategy(draw, n):
"""生成包含 N 个字符串的列表的策略"""
trace("a")
value = [draw(hypothesis.strategies.text(max_size=256)) for _ in range(n)]
trace("b")
return value
@given(my_custom_strategy(100))
def test_my_custom_strategy(value):
assert len(value) == 100
assert all(isinstance(v, str) for v in value)
在这种情况下,trace("a")
被调用了 206 次,而 trace("b")
只被调用了 100 次。这些数字在多次运行中保持一致。
更为问题的是,当我调用 text()
的次数越多时,差距会呈超线性增长。当 n=200
时,trace("a")
被调用了 305 次。n=400
时,被调用了 984 次。当 n=500
或更多时,测试可靠地在第11次迭代后暂停,然后完成(仅有 11 次迭代,而不是 100 次!)
这里发生了什么?
英文:
I have a custom strategy built using composite
that draws from text
strategy internally.
Debugging another error (FailedHealthCheck.data_too_large
) I realized that drawing from the text
strategy can cause my composite strategy to be invoked roughly twice as often as expected.
I was able to reproduce the following minimal example:
@hypothesis.strategies.composite
def my_custom_strategy(draw, n):
"""Strategy to generate lists of N strings"""
trace("a")
value = [draw(hypothesis.strategies.text(max_size=256)) for _ in range(n)]
trace("b")
return value
@given(my_custom_strategy(100))
def test_my_custom_strategy(value):
assert len(value) == 100
assert all(isinstance(v, str) for v in value)
In this scenario, trace("a")
was invoked 206 times, whereas trace("b")
was only invoked 100 times. These numbers are consistent across runs.
More problematic, the gap increases the more times I call text(), and super-linearly. When n=200
, trace("a")
is called 305 times. n=400
, 984 times. n=500
or greater, the test reliably pauses and then completes after the 11th iteration (with only 11 iterations, instead of 100!)
What's happening here?
答案1
得分: 1
我怀疑这是因为你遇到了生成假设示例所使用的最大熵(约8K),如果你生成的一些字符串恰好很长,设置一个合理的max_size
在文本策略中会有所帮助,如果我没弄错的话。
作为一个更一般的提示,如果你使用lists()
策略(或其他集合策略),而不是选择一个整数,然后有那么多元素,那么缩小可能更有效。不过,这不是一个微妙的问题;如果你还没有注意到,你其实不需要做任何事情!
英文:
I suspect it's because you're running into the maximum entropy (about 8K) used to generate Hypothesis examples, if some of the strings you generate happen to be quite long. Setting a reasonable max_size
in the text strategy would help, if I'm right.
As a more general tip, shrinking can be more efficient if you use the lists()
strategy (or another collections strategy) rather than picking an integer and then that many elements. This is not a subtle problem though; if you haven't already noticed you don't need to do anything!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论