使用Python Unittest测试随机生成的2D数组的正确方式

huangapple go评论62阅读模式
英文:

Proper way to use Python Unittest on a random 2D array

问题

下面是代码的翻译部分:

假设我有一个函数它返回一个3x3的二维数组数组中的元素是在给定范围内随机生成的

```python
def random3by3Matrix(smallest_num, largest_num):
    matrix = [[0 for x in range(3)] for y in range(3)]
    for i in range(3):
        for j in range(3):
            matrix[i][j] = random.randrange(int(smallest_num), int(largest_num + 1)) if smallest_num != largest_num else smallest_num
    return matrix

print(random3by3Matrix(-10, 10))

上面的代码会返回类似于以下的结果:

[[-6, 10, -4], [-10, -9, 8], [10, 1, 1]]

那么如何为这样的函数编写单元测试呢?我考虑使用一个辅助函数:

def isEveryEntryGreaterEqual(list1, list2):
    for i in range(len(list1)):
        for j in range(len(list1[0])):
            if not (list1[i][j] <= list2[i][j]):
                return False
    return True

import unittest

class TestFunction(unittest.TestCase):
    def test_random3by3Matrix(self):
        lower_bound = [[-10 for x in range(3)] for y in range(3)]
        upper_bound = [[10 for x in range(3)] for y in range(3]

        self.assertEqual(True, isEveryEntryGreaterEqual(lower_bound, random3by3Matrix(-10,10)))
        self.assertEqual(True, isEveryEntryGreaterEqual(random3by3Matrix(-10,10), upper_bound))

但是否有更简洁的方法来完成这个任务呢?此外,如何测试所有的值不仅在边界之间,而且也是随机分布的呢?

请注意,这段代码翻译的内容不包括代码本身,只包括代码的描述和问题部分。如果您需要代码的翻译,请提供具体的代码部分。

英文:

Suppose i have a function that returns a 3 by 3, 2d array with random entries in a given bound:

def random3by3Matrix(smallest_num, largest_num):
    matrix = [[0 for x in range(3)] for y in range(3)]
    for i in range(3):
        for j in range(3):
            matrix[i][j] = random.randrange(int(smallest_num),
                                            int(largest_num + 1)) if smallest_num != largest_num else smallest_num
    return matrix

print(random3by3Matrix(-10, 10))

Code above returns something like this:

[[-6, 10, -4], [-10, -9, 8], [10, 1, 1]]

How would I write a unittest for a function like this? I thought of using a helper function:

def isEveryEntryGreaterEqual(list1, list2):
    for i in range(len(list1)):
        for j in range(len(list1[0])):
            if not (list1[i][j] &lt;= list2[i][j]):
                return False
    return True


class TestFunction(unittest.TestCase):
    def test_random3by3Matrix(self):
        lower_bound = [[-10 for x in range(3)] for y in range(3)]
        upper_bound = [[10 for x in range(3)] for y in range(3)]

        self.assertEqual(True, isEveryEntryGreaterEqual(lower_bound, random3by3Matrix(-10,10)))
        self.assertEqual(True, isEveryEntryGreaterEqual(random3by3Matrix(-10,10), upper_bound))

But is there a cleaner way to do this?
Furthermore, how would you test that all of your values are not only between the boundaries, but also distributet randomly?

答案1

得分: 1

# 测试矩阵边界
看起来你想测试矩阵中的每个元素是否都大于某个值独立于矩阵中的位置你可以通过提取矩阵中的所有元素并一次性检查它们而不是使用双重循环使代码更短且更可读你可以轻松地使用 `numpy.flatten()` 将任何数组嵌套转换为1D数组然后使用Python内置的 `all()` 方法一次性测试生成的1D数组这样你就可以避免自己遍历所有元素
```python
import numpy as np
def is_matrix_in_bounds(matrix, low, high):
    flat_list = np.flatten(matrix)  # 创建一个1D列表
    # 每个元素都是一个布尔值,如果在边界内则为True
    in_bounds = [low <= e <= high for e in flat_list]
    # all()如果in_bounds中的每个元素都为True则返回True,一旦in_bounds中有一个元素为False就返回False
    return all(in_bounds)

class TestFunction(unittest.TestCase):
    def test_random3by3Matrix(self):
        lower_bound = -10
        upper_bound = 10
        matrix = random3by3Matrix(-10,10)
        self.assertEqual(True, is_matrix_in_bounds(matrix, lower_bound, upper_bound))

如果你将要在多个测试中使用诸如矩阵和边界之类的东西,将它们设为类属性可能是有益的,这样你就不必在每个测试函数中定义它们。

测试矩阵随机性

测试某个矩阵是否真正随机分布有点困难,因为它将涉及进行统计测试以检查变量是否随机分布。在这里,你能做的最好的事情是计算它们确实是随机分布的几率,并对这些几率设置一个允许的下限。由于矩阵是随机的且矩阵中的值不相互依赖,你很幸运,因为你可以再次将它们视为1D分布进行测试。

要测试这一点,你应该创建一个第二个随机均匀分布,并使用Kolmogorov-Smirnov测试检查你的矩阵和新分布之间的拟合度。这将两个分布视为随机样本,并测试它们是否从相同的基础分布中抽取。在你的情况下:一个随机均匀分布。如果这两个分布差异很大,它将具有非常低的p值(即这些分布被抽取自相同的基础分布的几率很低)。如果它们相似,p值将很高。你想要一个随机矩阵,所以你希望一个高的p值。通常的截止值是0.05(这意味着1/20的分布将被认为是非随机的,因为它们看起来有点非随机是偶然的)。Python通过scipy模块提供了这样的测试。在这里,你可以传递两个样本(双样本ks测试),或者传递某个分布的名称并指定参数(单样本ks测试)。对于后一种情况,分布名称应该是 scipy.stats 中的某个分布的名称,你可以通过关键字 args=() 传递参数以创建这样的分布。

import numpy as np
from scipy import stats
def test_matrix_randomness(matrix, low, high):
    lower_bound = -10
    upper_bound = 10
    matrix = random3by3matrix(-10, 10)
    # 双样本测试
    random_dist = np.random.random_integers(low=lower_bound, high=upper_bound, size=3*3)
    statistic, p_value = stats.kstest(random_dist, np.flatten(matrix))
    # 单样本测试,等效但更整洁
    # 不需要创建第二个分布
    statistic, p_value = stats.kstest(random_dist, "randint", args=(-10, 10))
    self.assertEqual(True, p_value > 0.05)

注意,带有随机因素的单元测试有时会失败。这就是随机性的本质。


<details>
<summary>英文:</summary>

# Test matrix bounds
It looks like you want to test if every single element in the matrix is greater than some value, independently of where in the matrix this element is. You can make this code shorter and more readable by e.g. extracting all the elements from the matrix and checking them all in one go, instead of the double for loop. You can easily transform any array-nesting with `numpy.flatten()` to a 1D array, and then test the resulting 1D array in one go with python&#39;s built-in `all()` method. This way, you can avoid looping over all the elements yourself:
```python
import numpy as np
def is_matrix_in_bounds(matrix, low, high):
    flat_list = np.flatten(matrix) # create a 1D list
    # Each element is a boolean, that is True if it&#39;s within bounds
    in_bounds = [low &lt;= e &lt;= high for e in flat_list]
    # all() returns True if each element in in_bounds is &#39;True
    # returns False as soon as a single element in in_bounds is False
    return all(in_bounds)

class TestFunction(unittest.TestCase):
    def test_random3by3Matrix(self):
        lower_bound = -10
        upper_bound = 10
        matrix = random3by3Matrix(-10,10)
        self.assertEqual(True, is_matrix_in_bounds(matrix, lower_bound, upper_bound))

If you will be using things like the matrix and the bounds in multiple tests, it may be beneficial to make them class attributes, so you don't have to define them in each test function.

Test matrix randomness

Testing if some matrix is truly randomly distributed is a bit harder, since it will involve a statistical test to check if the variables are randomly distributed or not. The best you can do here is calculate the odds that they are indeed randomly distributed, and put a threshold on how low these odds are allowed to be. Since the matrix is random and the values in the matrix do not depend on each other, you're in luck, because you can again test them as if they were a 1D distribution.

To test this, you should create a second random uniform distribution, and test the goodness of fit between your matrix and the new distribution with a Kolmogorov-Smirnov test. This considers the two distributions as random samples, and tests how likely it is that they were drawn from the same underlying distribution. In your case: a random uniform distribution. If the distributions are vastly different, it will have a very low p-value (i.e. the odds of these distributions being drawn from the same underlying distribution is low). If they are similar, the p-value will be high. You want a random matrix, so you want a high p-value. The usual cutoff for this is 0.05 (which means that 1/20 distributions will be considered non-random, because they look kinda non-random by happenstance). Python provides such a test with the scipy module. Here, you can either pass two samples (a two-sample ks test), or pass the name of some distribution and specify the parameters (a one-sample ks test). For the latter case, the distribution name should be the name of a distribution in scipy.stats, and you can pass the arguments to create such a distribution via the keyword args=().

import numpy as np
from scipy import stats
def test_matrix_randomness(matrix, low, high):
    lower_bound = -10
    upper_bound = 10
    matrix = random3by3matrix(-10, 10)
    # two-sample test
    random_dist = np.random.random_integers(low=lower_bound, high=upper_bound, size=3*3)
    statistic, p_value = stats.kstest(random_dist, np.flatten(matrix))
    # one-sample test, equivalent, but neater
    # doesn&#39;t require creating a second distribution
    statistic, p_value = stats.kstest(random_dist, &quot;randint&quot;, args=(-10, 10))
    self.assertEqual(True, p_value &gt; 0.05)

Note that unittests with a random aspect will sometimes fail. Such is the nature of randomness.

see:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html#scipy.stats.kstest
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.randint.html#scipy.stats.randint

huangapple
  • 本文由 发表于 2023年2月16日 17:48:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/75470456.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定