`-fsanitize=bounds-strict` 检查了 `-fsanitize=address` 没有检查的东西吗?

huangapple go评论65阅读模式
英文:

Does `-fsanitize=bounds-strict` check anything that `-fsanitize=address` does not?

问题

GCC手册中提到:

-fsanitize=bounds-strict

此选项启用对数组边界的严格仪器检测。大多数越界访问都会被检测到,包括灵活数组成员和类似灵活数组成员的数组。具有静态存储的变量的初始化不会被仪器化。

然而,在这里,即使使用-fsanitize=address,undefined,bounds-strict选项(当选择offset以访问错误的数组时),sum可以访问其参数的越界,如下所示:

#include <stdio.h>
#include <stdlib.h>

#define n 1000

typedef double array[n];

double sum(array a, long offset)
{
    double acc = 0;

    for(int i = 0; i < n; ++i)
        acc += a[i + offset];

    return acc;
}

int main()
{
    array a;
    array b;

    for(int i = 0; i < n; ++i) a[i] = 1;
    for(int i = 0; i < n; ++i) b[i] = 2;

    long offset = b - a;
    printf("%ld\n", offset);

    printf("%f\n", sum(a, offset));
}

使用GCC 10.2编译这段代码,结果如下:

$ gcc -pedantic -Wall -std=c17 -fsanitize=address,undefined,bounds-strict memory_errors_two_arrays.c && ./a.out 
1032
2000.000000

因此,代码使用2031来解引用double[1000],而没有发生错误。那么,-fsanitize=bounds-strict是否检查-fsanitize=address未检查的任何内容呢?

英文:

The GCC manual says:

> -fsanitize=bounds-strict
>
> This option enables strict instrumentation of array bounds. Most out
> of bounds accesses are detected, including flexible array members and
> flexible array member-like arrays. Initializers of variables with
> static storage are not instrumented.

However, here, sum can access its argument out-of-bounds, despite the use of -fsanitize=address,undefined,bounds-strict (when offset is chosen to reach into the wrong array):

#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;

#define n 1000

typedef double array[n];

double sum(array a, long offset)
{
    double acc = 0;

    for(int i = 0; i &lt; n; ++i)
        acc += a[i + offset];

    return acc;
}

int main()
{
    array a;
    array b;
    
    for(int i = 0; i &lt; n; ++i) a[i] = 1;
    for(int i = 0; i &lt; n; ++i) b[i] = 2;

    long offset = b - a;
    printf(&quot;%ld\n&quot;, offset);

    printf(&quot;%f\n&quot;, sum(a, offset));
}

Compiling this with GCC 10.2 gives me:

$ gcc -pedantic -Wall -std=c17 -fsanitize=address,undefined,bounds-strict memory_errors_two_arrays.c &amp;&amp; ./a.out 
1032
2000.000000

So the code dereferenced a double[1000] with 2031 and didn't even blink.

Does -fsanitize=bounds-strict check anything that -fsanitize=address does not?

答案1

得分: 1

-fsanitize=bounds-strict`检查的是数组边界,而不是一般对象的边界。它使用编译时关于数组长度的信息,主要来自于用于访问的lvalues的类型,以生成仪器。这与动态分配对象的分配大小无关,而所使用的信息类型无法从许多常见的创建和访问动态分配对象的习惯用法中获取,就像原始版本中的问题一样。

考虑以下示例:

#include <stdio.h>
int main(void) {
    int a[2][256] = {0};

    printf("%d\n", a[0][256]);
}

如果我使用-fsanitize=address编译并运行结果,它只会打印"0"。

如果我使用-fsanitize=bounds-strict编译并运行它,我会得到这个报告:

santest.c:7:24: runtime error: index 256 out of bounds for type 'int [256]'

(并且它也打印"0")。

显然,尽管-fsanitize仪器发出的诊断信息被设计成错误消息的风格,但这些消息并不代表致命错误。目标是使这种信息性消息在程序的正常行为之外,而不是替代或破坏正常行为。特别是,当检测到不正确的内存访问时,-fsanitize不会终止程序,尽管在某些情况下可能会导致某些其他机制失败。

在发布原始版本的答案后,问题被修改以呈现不同的情况。就我编写此答案的当前版本而言,值得注意的是,相对容易隐藏无效bounds-strict模式依赖的类型信息。当前在问题中呈现的示例代码通过在声明的数组和数组访问之间插入函数调用接口来实现此目的。

在这种情况下,我注意到,在以下范围内:

#define n 1000

typedef double array[n];

这个函数声明...

double sum(array a, long offset)

double sum(double *a, long offset)

是100%等价的。尽管使用了typedef,但它并不提供有关在数组中跟随*a的元素数量的信息,如果有的话。这个函数实际上必须接受指向任意长度数组的指针,因此bounds-strict使用生成仪器的信息是不可用的,因为它会错误地假定特定数组长度。

与此相反,考虑以下变化:

double access_flat(double *p, int i) {
    return p[i];
}

double access_dim200(double (*p)[200], int i) {
    return (*p)[i];
}

double access_dim100(double (*p)[100], int i) {
    return (*p)[i];
}
 
int main(void) {
    double *p = calloc(200, sizeof *p);
    double q[2][100] = {0};

    printf("p, flat:   %lf\n", access_flat(p, 100));
    printf("p, dim100: %lf\n", access_dim100((double (*)[100]) p, 100));
    printf("p, dim200: %lf\n\n", access_dim200((double (*)[200]) p, 100));

    printf("q, flat:   %lf\n", access_flat((double *) q, 100));
    printf("q, dim100: %lf\n", access_dim100(q, 100));
    printf("q, dim200: %lf\n", access_dim200((double (*)[200]) q, 100));
}

使用-fsanitize=bounds-strict编译时,此程序的输出是:

p, flat:   0.000000
santest.c:14:16: runtime error: index 100 out of bounds for type 'double [100]'
p, dim100: 0.000000
p, dim200: 0.000000

q, flat:   0.000000
q, dim100: 0.000000
q, dim200: 0.000000

这显示了几件事,其中包括:

  • bounds-strict确实使用用于访问的lvalue类型生成仪器。
  • 我的GCC版本(v8.5.0)中的-fsanitize版本与自动分配的数组(和静态分配的数组;未显示)不同于动态分配的数组,并且
  • 我的GCC中的-fsanitize=bounds-strict版本对于自动分配的数组(和静态分配的数组;未显示)存在错误,无法报告一些数组边界越界。

尽管存在错误,再次回答是,bounds-check模式确实检查address模式不检查的一些内容。对于我来说,bounds-check还需要一个额外的库libubsan,而address模式本身不需要。这两种模式具有重叠的应用领域,但它们的设计不同,每种模式检测到另一种模式不检测到的一些问题。

英文:

> Does -fsanitize=bounds-strict check anything that -fsanitize=address does not?

Yes.

-fsanitize=bounds-strict is about checking array bounds, not the bounds of general objects. It uses compile-time information about array lengths, drawn largely from the types of the lvalues used for access, to generate instrumentation. That is orthogonal to the allocation size of dynamically-allocated objects, and the kind of information used is unavailable from many of the common idioms for creating and accessing dynamically allocated objects, such as the one in the original version of this question.

Consider this, for example:

#include &lt;stdio.h&gt;
int main(void) {
    int a[2][256] = {0};

    printf(&quot;%d\n&quot;, a[0][256]);
}

If I compile with -fsanitze=address and run the result, it just prints "0".

If I compile with -fsanitize=bounds-strict and run it, I get this report:

santest.c:7:24: runtime error: index 256 out of bounds for type &#39;int [256]&#39;

(and it also prints "0").

Apparently, it is also notable that although the diagnostic information emitted by -fsanitize instrumentation is styled as error messages, these do not represent fatal errors. The objective is for such informational messages to be in addition to the normal behavior of the program, not to replace or subvert the normal behavior. In particular, -fsanitize does not abort the program when an incorrect memory access is detected, though some other mechanism might cause such a failure in some cases.


After the original version of this answer was posted, the question was modified to present a different situation. With respect to the version that is current as I write this, it is relevant that it is relatively easy to hide or moot the type information on which -fsanitize's bounds-strict mode relies. The example code currently presented in the question does this by interposing a function call interface between a declared array and the array access.

In this context, I observe that in the scope of

> #define n 1000
>
> typedef double array[n];

, this function declaration ...

> double sum(array a, long offset)

is 100% equivalent to

double sum(double *a, long offset)

. The typedef notwithstanding, it does not convey any information about how many more elements follow *a in the array, if any, of which it is a member. This function must in fact accept pointers into arrays of arbitrary length, so not only is there is no information for bounds-strict to use for generating instrumentation, it would be incorrect for it to be instrumented to assume a particular array length.

Contrast with this variation:

double access_flat(double *p, int i) {
    return p[i];
}

double access_dim200(double (*p)[200], int i) {
    return (*p)[i];
}

double access_dim100(double (*p)[100], int i) {
    return (*p)[i];
}
 
int main(void) {
    double *p = calloc(200, sizeof *p);
    double q[2][100] = {0};

    printf(&quot;p, flat:   %lf\n&quot;, access_flat(p, 100));
    printf(&quot;p, dim100: %lf\n&quot;, access_dim100((double (*)[100]) p, 100));
    printf(&quot;p, dim200: %lf\n\n&quot;, access_dim200((double (*)[200]) p, 100));

    printf(&quot;q, flat:   %lf\n&quot;, access_flat((double *) q, 100));
    printf(&quot;q, dim100: %lf\n&quot;, access_dim100(q, 100));
    printf(&quot;q, dim200: %lf\n&quot;, access_dim200((double (*)[200]) q, 100));
}

When compiled with -fsanitize=bounds-strict, the output of this program is:

p, flat:   0.000000
santest.c:14:16: runtime error: index 100 out of bounds for type &#39;double [100]&#39;
p, dim100: 0.000000
p, dim200: 0.000000

q, flat:   0.000000
q, dim100: 0.000000
q, dim200: 0.000000

This shows several things, among them that

  • bounds-strict is indeed using the type of the lvalue used for access to generate instrumentation, and
  • the version of -fsanitize in my GCC (v8.5.0) treats automatically allocated (and statically allocated; not shown) arrays differently than dynamically allocated ones, and
  • the version of -fsanitize=bounds-strict in my GCC is buggy for automatically allocated (and statically allocated; not shown) arrays, failing to report some array-bounds overruns.

Bugs notwithstanding, again yes, bounds-check mode does check some things that address mode does not. For me, bounds-check also requires an additional library, libubsan, that address mode by itself does not. The two modes have overlapping area of application, but their design is different, and each detects some issues that the other does not.

huangapple
  • 本文由 发表于 2023年2月24日 05:22:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/75550442.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定