在Python的Hypothesis库中,为什么text()策略会导致自定义策略重试?

huangapple go评论65阅读模式
英文:

In the Hypothesis library for Python, why does the text() strategy cause custom strategies to retry?

问题

我使用 composite 创建了一个自定义策略,该策略内部使用了 text 策略。

在调试另一个错误(FailedHealthCheck.data_too_large)时,我意识到从 text 策略中提取数据会导致我的复合策略被调用的频率大约是预期的两倍。

我能够重现以下的最小示例:

@hypothesis.strategies.composite
def my_custom_strategy(draw, n):
    """生成包含 N 个字符串的列表的策略"""

    trace("a")
    value = [draw(hypothesis.strategies.text(max_size=256)) for _ in range(n)]
    trace("b")
    return value


@given(my_custom_strategy(100))
def test_my_custom_strategy(value):
    assert len(value) == 100
    assert all(isinstance(v, str) for v in value)

在这种情况下,trace("a") 被调用了 206 次,而 trace("b") 只被调用了 100 次。这些数字在多次运行中保持一致。

更为问题的是,当我调用 text() 的次数越多时,差距会呈超线性增长。当 n=200 时,trace("a") 被调用了 305 次。n=400 时,被调用了 984 次。当 n=500 或更多时,测试可靠地在第11次迭代后暂停,然后完成(仅有 11 次迭代,而不是 100 次!)

这里发生了什么?

英文:

I have a custom strategy built using composite that draws from text strategy internally.

Debugging another error (FailedHealthCheck.data_too_large) I realized that drawing from the text strategy can cause my composite strategy to be invoked roughly twice as often as expected.

I was able to reproduce the following minimal example:

@hypothesis.strategies.composite
def my_custom_strategy(draw, n):
    """Strategy to generate lists of N strings"""

    trace("a")
    value = [draw(hypothesis.strategies.text(max_size=256)) for _ in range(n)]
    trace("b")
    return value


@given(my_custom_strategy(100))
def test_my_custom_strategy(value):
    assert len(value) == 100
    assert all(isinstance(v, str) for v in value)

In this scenario, trace("a") was invoked 206 times, whereas trace("b") was only invoked 100 times. These numbers are consistent across runs.

More problematic, the gap increases the more times I call text(), and super-linearly. When n=200, trace("a") is called 305 times. n=400, 984 times. n=500 or greater, the test reliably pauses and then completes after the 11th iteration (with only 11 iterations, instead of 100!)

What's happening here?

答案1

得分: 1

我怀疑这是因为你遇到了生成假设示例所使用的最大熵(约8K),如果你生成的一些字符串恰好很长,设置一个合理的max_size在文本策略中会有所帮助,如果我没弄错的话。

作为一个更一般的提示,如果你使用lists()策略(或其他集合策略),而不是选择一个整数,然后有那么多元素,那么缩小可能更有效。不过,这不是一个微妙的问题;如果你还没有注意到,你其实不需要做任何事情!

英文:

I suspect it's because you're running into the maximum entropy (about 8K) used to generate Hypothesis examples, if some of the strings you generate happen to be quite long. Setting a reasonable max_size in the text strategy would help, if I'm right.

As a more general tip, shrinking can be more efficient if you use the lists() strategy (or another collections strategy) rather than picking an integer and then that many elements. This is not a subtle problem though; if you haven't already noticed you don't need to do anything!

huangapple
  • 本文由 发表于 2023年4月11日 10:45:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75982050.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定