将模糊测试应用于解析某个字符串的函数。

huangapple go评论95阅读模式
英文:

Apply fuzzing to a function that parses some string

问题

最近,Go团队发布了一个模糊测试工具。你可以从以下链接了解更多信息:https://blog.golang.org/fuzz-beta

这个模糊测试工具可以帮助你实现以下测试目标:

  1. 发现代码中的潜在漏洞和错误。
  2. 提高代码的质量和稳定性。
  3. 探索代码的边界情况和异常情况。

要使用这个模糊测试工具,你可以按照以下步骤进行:

  1. 下载并安装最新版本的Go语言。
  2. 在你的代码中引入模糊测试工具的包。
  3. 编写适当的测试用例和输入数据。
  4. 运行模糊测试工具,它将自动生成各种输入数据并执行测试。
  5. 分析测试结果,查找潜在的问题和错误。

关于运行时间的问题,没有一个确定的答案,因为它取决于你的代码规模和复杂性。你可以先运行一小部分测试,然后根据需要逐步增加测试的规模和运行时间。

对于执行失败与代码的关联性的问题,你可能会得到大量的结果。为了处理这个问题,你可以考虑以下几点:

  1. 使用日志记录工具来记录执行失败的情况和相关的代码位置。
  2. 使用断言或异常处理机制来捕获和处理执行失败的情况。
  3. 使用代码覆盖率工具来确定哪些代码路径被执行和测试覆盖的情况。
  4. 使用调试工具来分析执行失败的原因和代码路径。

以上是关于模糊测试工具的一些基本信息和使用建议。希望对你有帮助!

英文:

Recently the Go team released a fuzzer https://blog.golang.org/fuzz-beta

Can you help to describe what I can expect from the fuzzer in terms of test goals?

How to apply the fuzzer?

Give some insight about how long we would run it before considering it good enough

How to correlate execution failure with the code (i expect having GB of results, I am wondering how overwhelming that could be and how to handle that)

See this intentionally terrific piece of code which would definitely need to be fuzzed

package main

import (
	"fmt"
	"log"
)

func main() {
	type expectation struct {
		input  string
		output []string
	}

	expectations := []expectation{
		expectation{
			input: "foo=bar baz baz foo:1 baz ",
			output: []string{
				"foo=bar baz baz",
				"foo:1 baz",
			},
		},
		expectation{
			input: "foo=bar baz baz foo:1 baz   foo:234.mds32",
			output: []string{
				"foo=bar baz baz",
				"foo:1 baz",
				"foo:234.mds32",
			},
		},
		expectation{
			input: "foo=bar baz baz foo:1 baz   foo:234.mds32  notfoo:baz  foo:bak foo=bar baz foo:nospace foo:bar",
			output: []string{
				"foo=bar baz baz",
				"foo:1 baz",
				"foo:234.mds32",
				"notfoo:baz",
				"foo:bak",
				"foo=bar baz",
				"foo:nospace",
				"foo:bar",
			},
		},
		expectation{
			input: "foo=bar",
			output: []string{
				"foo=bar",
			},
		},
		expectation{
			input: "foo",
			output: []string{
				"foo",
			},
		},
		expectation{
			input: "=bar",
			output: []string{
				"=bar",
			},
		},
		expectation{
			input: "foo=bar baz baz foo:::1 baz  ",
			output: []string{
				"foo=bar baz baz",
				"foo:::1 baz",
			},
		},
		expectation{
			input: "foo=bar baz baz   foo:::1 baz  ",
			output: []string{
				"foo=bar baz baz",
				"foo:::1 baz",
			},
		},
	}

	for i, expectation := range expectations {
		fmt.Println("  ==== TEST ", i)
		success := true
		res := parse(expectation.input)
		if len(res) != len(expectation.output) {
			log.Printf("invalid length of results for test %v\nwanted %#v\ngot    %#v", i, expectation.output, res)
			success = false
		}
		for e, r := range res {
			if expectation.output[e] != r {
				log.Printf("invalid result for test %v at index %v\nwanted %#v\ngot    %#v", i, e, expectation.output, res)
				success = false
			}
		}
		if success {
			fmt.Println("  ==== SUCCESS")
		} else {
			fmt.Println("  ==== FAILURE")
			break
		}
		fmt.Println()
	}
}

func parse(input string) (kvs []string) {
	var lastSpace int
	var nextLastSpace int
	var n int
	var since int
	for i, r := range input {
		if r == ' ' {
			nextLastSpace = i + 1
			if i > 0 && input[i-1] == ' ' {
				continue
			}
			lastSpace = i
		} else if r == '=' || r == ':' {
			if n == 0 {
				n++
				continue
			}
			n++
			if since < lastSpace {
				kvs = append(kvs, string(input[since:lastSpace]))
			}
			if lastSpace < nextLastSpace { // there was multiple in between spaces.
				since = nextLastSpace
			} else {
				since = lastSpace + 1
			}
		}
	}
	if since < len(input) { // still one entry
		var begin int
		var end int
		begin = since
		end = len(input)
		if lastSpace > since { // rm trailing spaces it ends with 'foo:whatever    '
			end = lastSpace
		} else if since < nextLastSpace { // rm starting spaces it ends with '   foo:whatever'
			begin = nextLastSpace
		}
		kvs = append(kvs, string(input[begin:end]))
	}
	return
}

答案1

得分: 2

所以,我稍微研究了一下模糊草案设计。以下是一些见解。

首先,如博客文章中建议的那样,你需要运行Go的最新版本:

go get golang.org/dl/gotip@latest
gotip download

gotip命令作为“go命令的替代品”运行,而不会破坏你当前的安装。

期望

模糊器基本上会生成一组对某个函数输入参数的变异,并使用这些变异的测试运行来发现错误。

你不需要自己编写任意数量的测试用例,而是向引擎提供示例输入,引擎会自动对其进行变异并调用你的函数以使用新的参数。然后,这个测试集将被缓存,以便作为回归测试的基础。

如何应用模糊器?

博客文章和草案设计以及tips文档已经相当详细地介绍了如何应用模糊器。

testing包现在有一个新类型testing.F,你将其传递给模糊目标。与单元测试和基准测试一样,模糊目标的名称必须以Fuzz前缀开头。因此,函数签名看起来像这样:

func FuzzBlah(f *testing.F) {
    // ...
}

模糊目标的主体基本上使用testing.F API来:

使用F.Add提供一个种子测试集

种子测试集是用户指定的一组输入,将默认与go test一起运行。这些输入应该由有意义的输入组成,用于测试包的行为,以及一组用于由模糊引擎发现的新错误的回归输入。

因此,这些是你的parse函数的实际测试用例输入,就像你自己编写的那些一样。

func FuzzBlah(f *testing.F) {
    f.Add("foo=bar")
    f.Add("foo=bar baz baz foo:1 baz ")
    // 等等
}

使用F.Fuzz运行带有模糊输入的函数

每个模糊目标调用一次f.FuzzFuzz的参数是一个函数,该函数接受一个testing.T和与传递给f.Add的参数类型相同的N个参数。如果你的示例测试只接受一个字符串,那么它将是这样的:

func FuzzBlah(f *testing.F) {
    f.Add("foo=bar")
    f.Add("foo=bar baz baz foo:1 baz ")
    
    f.Fuzz(func(t *testing.T, input string) {

    })
}

然后,模糊函数的主体就是你想要测试的内容,例如你的parse函数。

我认为理解和使用模糊器的关键是,你不是测试输入和预期输出的配对关系。你可以使用单元测试来做到这一点。

通过模糊测试,你测试的是代码在给定输入下是否会出错。给定的输入足够随机以覆盖边界情况。这就是为什么官方示例中:

  • 预期失败的情况下调用t.Skip()
  • 运行反向检查函数,例如Marshal然后Unmarshal,或者url.ParseQuery然后query.Encode

输入能够正确编组,但无法解组回原始值的情况是意外的失败,而模糊器比你编写手动测试更擅长发现这些情况。

因此,将所有这些组合在一起,模糊目标可能是这样的:

func FuzzBlah(f *testing.F) {
    f.Add("foo=bar")
    f.Add("foo=bar baz baz foo:1 baz ")
    // 等等
    
    f.Fuzz(func(t *testing.T, input string) {
        out := parse(input)
        // 输出错误,跳过
        if len(out) == 0 {
            t.Skip() 
        }

        // 反向检查
        enc := encode(out)
        if enc != input {
            t.Errorf("countercheck failed")
        }
    })
}

导致测试失败的输入将被添加到测试集中,因此你可以修复代码并进行回归测试。

运行测试

只需使用go test命令并带上-fuzz <regex>标志。你可以使用以下方式指定模糊器运行的持续时间或次数:

  • -fuzztime <duration>,其中durationtime.Duration字符串,例如10m
  • -fuzztime Nx,其中N是迭代次数,例如20x

对于你的测试来说,需要运行多长时间或多少次取决于你要测试的代码。我相信Go团队会在适当的时候提供更多关于这方面的建议。

因此,总结一下:

  • gotip test -fuzz . -fuzztime 20x

这还会在$GOCACHE/fuzz/的适当子目录中生成测试集。

这应该足够让你开始了。正如我在评论中所说,这个功能还处于早期开发阶段,所以可能会有一些错误,并且文档可能不完善。随着更多信息的发布,我可能会更新这个答案。

英文:

So, I dug a bit into the fuzz draft design. Here's some insights.

First, as recommended in the blog post, you have to run the Go tip:

go get golang.org/dl/gotip@latest
gotip download

The gotip command acts as a "drop-in replacement for the go command" without messing up your current installation.

Expectations

The fuzzer basically generates a corpus of variations on some function's input parameters and runs the test with them in order to discover bugs.

Instead of writing an arbitrary number of test cases yourself, you provide example inputs to the engine and the engine mutates them and calls your function with the new parameters automatically. The corpus then will be cached so it will work as a basis for regression tests.

How to apply the fuzzer?

This is covered fairly decently by the blog post and the draft design and the documentation at tip

The testing package has now a new type testing.F which you pass to the fuzz targets. As in unit tests and benchmarks, the fuzz target name must start with Fuzz prefix. So the signature will look like:

func FuzzBlah(f *testing.F) {
    // ...
}

The body of fuzz target essentially uses the testing.F API to:

Provide a seed corpus with F.Add

> The seed corpus is the user-specified set of inputs to a fuzz target which will be run by default with go test. These should be composed of meaningful inputs to test the behavior of the package, as well as a set of regression inputs for any newly discovered bugs identified by the fuzzing engine

So these are actual test case inputs of your parse function, those that you would write yourself.

func FuzzBlah(f *testing.F) {
    f.Add(&quot;foo=bar&quot;)
    f.Add(&quot;foo=bar baz baz foo:1 baz &quot;)
    // and so on
}

Run the function with fuzzed inputs with F.Fuzz

Each fuzz target calls f.Fuzz once. The argument to Fuzz is a function that takes a testing.T and N params of the same type as those passed to f.Add. If your example test takes only one string, it would be:

func FuzzBlah(f *testing.F) {
    f.Add(&quot;foo=bar&quot;)
    f.Add(&quot;foo=bar baz baz foo:1 baz &quot;)
    
    f.Fuzz(func(t *testing.T, input string) {

    })
}

The body of the fuzz function then is just whatever you want to test, so for example your parse function.

What is critical, I think, in understanding and using the fuzzer is that you don't test pairs of inputs and expected outputs. You do that with unit tests.

By fuzzing instead you test that the code doesn't break for a given input. The given input is randomized enough to cover corner cases. That's why the official examples:

  • call t.Skip() in case of expected failures
  • run countercheck functions, e.g. Marshal then Unmarshal, or url.ParseQuery then query.Encode

Cases where the input marshals correctly but then doesn't unmarshal back to the original value are unexpected failures, and those that a fuzzer is better at finding than you could be by writing manual tests.

So to put it all together, the fuzz target could be:

func FuzzBlah(f *testing.F) {
    f.Add(&quot;foo=bar&quot;)
    f.Add(&quot;foo=bar baz baz foo:1 baz &quot;)
    // and so on
    
    f.Fuzz(func(t *testing.T, input string) {
        out := parse(input)
        // bad output, skip
        if len(out) == 0 {
            t.Skip() 
        }

        // countercheck
        enc := encode(out)
        if enc != input {
            t.Errorf(&quot;countercheck failed&quot;)
        }
    })
}

Inputs that result in a test failure will be added to the corpus, so you can fix the code and run regression tests.

Running it

You just run go test with the -fuzz &lt;regex&gt; flag. You can specify a duration or a number of times the fuzzer runs for with:

  • -fuzztime &lt;duration&gt; where duration is a time.Duration string, e.g. 10m
  • -fuzztime Nx where N is the number of iterations, e.g. 20x

How long or how many is enough for your tests is going to depend on what code you are testing. I trust the Go team will provide some more recommendations about that in due time.

So to wrap up:

  • gotip test -fuzz . -fuzztime 20x

which will also generate the corpus in the appropriate subdirs of $GOCACHE/fuzz/.

This should be enough to get you started. As I said in the comments, the feature is early in development so there might be bugs and the documentation might be lacking. I'll probably update this answer as more info come out.

huangapple
  • 本文由 发表于 2021年8月7日 16:01:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/68690476.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定