2023年8月5日 16:07:35go评论153阅读模式

英文:

How can I get a warning when comparing unsigned integers of different sizes in C and C++?

问题

C或C++中常见的错误来源之一是这样的代码：

size_t n = // ...
for (unsigned int i = 0; i &lt; n; i++) // ...

当unsigned int溢出时，这段代码可能会陷入无限循环。

例如，在Linux上，unsigned int是32位，而size_t是64位，所以如果n = 5000000000，我们会得到一个无限循环。

如何在GCC或Clang中获得关于这个问题的警告？

GCC的-Wall -Wextra不起作用：

#include &lt;stdint.h&gt;
void f(uint64_t n)
{
    for (uint32_t i = 0; i &lt; n; ++i) {
    }
}

gcc-13 -std=c17 \
       -Wall -Wextra -Wpedantic \
       -Warray-bounds -Wconversion \
       -fanalyzer \
       -c -o 76840686.o 76840686.c

（没有输出）

我正在寻找一种不需要n是编译时常量的解决方案。
理想情况下，该解决方案应该适用于现有的C/C++项目，而无需完全重写它们。
提出除编译器警告之外的其他工具也会很有用，但编译器警告本身会更好。

编辑：上游编译器功能请求

答案表明当前的编译器中不存在这样的警告。我已经开始提出上游功能请求：

GCC：添加警告标志以检查整数溢出

英文:

A common source of bugs in C or C++ is a code like this:

size_t n = // ...
for (unsigned int i = 0; i &lt; n; i++) // ...

which can infinite-loop when the unsigned int overflows.

For example, on Linux, unsigned int is 32-bit, while size_t is 64-bit, so if n = 5000000000, we get an infinite loop.

How can I get a warning about this with GCC or Clang?

GCC's -Wall -Wextra doesn't do it:

#include &lt;stdint.h&gt;
void f(uint64_t n)
{
    for (uint32_t i = 0; i &lt; n; ++i) {
    }
}

gcc-13 -std=c17 \
       -Wall -Wextra -Wpedantic \
       -Warray-bounds -Wconversion \
       -fanalyzer \
       -c -o 76840686.o 76840686.c

(no output)

I am looking for a solution that does not require n to be a compile-time constant.
Ideally the solution would work on existing C/C++ projects without having to rewrite them entirely.
Suggesting other tools than compiler warnings would also be useful, but compiler warnings themselves would be better

Edit: Upstream compiler feature requests

Answers have indicated no such warnings exist in current compilers. I have started to file upstream feature requests:

GCC: Add warning flags to check against integer overflow

答案1

得分: 24

没有出现gcc或clang内置的警告选项来执行所请求的操作。但是，我们可以使用clang-query来实现。

以下是一个clang-query命令，用于报告32位和64位整数的比较，假定int为32位，long为64位。关于这一点的详细信息如下：

#!/bin/sh
PATH=$HOME/opt/clang+llvm-14.0.0-x86_64-linux-gnu-ubuntu-18.04/bin:$PATH
# 在此查询中，注释将被忽略，因为clang-query（而不是shell）会识别并丢弃它们。
query='m
  binaryOperator(                            # 查找二元运算表达式
    anyOf(                                   # 其中任何一个条件满足：
      hasOperatorName("<"),                  #   运算符是<
      hasOperatorName("<="),                 #   运算符是<=
      hasOperatorName(">"),                  #   运算符是>
      hasOperatorName(">="),                 #   运算符是>=
      hasOperatorName("=="),                 #   运算符是==
      hasOperatorName("!=")                  #   运算符是!=
    ),
    hasEitherOperand(                        # 并且其中一个操作数
      implicitCastExpr(                      # 是一个隐式类型转换
        has(                                 # 来自
          expr(                              # 一个表达式
            hasType(                         # 其类型
              hasCanonicalType(              # 在解析typedef后
                anyOf(                       # 要么是
                  asString("int"),           #   int
                  asString("unsigned int")   #   unsigned int
                )
              )
            ),
            unless(                          # 除非该表达式
              integerLiteral()               # 是整数文字
            )
          )
        ),
        hasImplicitDestinationType(          # 并且转换后的类型
          hasCanonicalType(                  # 在解析typedef后
            anyOf(                           # 要么是
              asString("long"),              #   long
              asString("unsigned long")      #   unsigned long
            )
          )
        )
      ).bind("operand")
    )
  )
'
# 在test.c上运行查询。
clang-query \
  -c="set bind-root false" \
  -c="$query" \
  test.c -- -w
# 结束

当在以下test.c上运行时，它会报告所有指定的情况：

// test.c
// 演示不同大小操作数的比较报告。
#include <stddef.h>          // size_t
#include <stdint.h>          // int32_t, etc.
void test(int32_t i32, int64_t i64, uint32_t u32, uint64_t u64)
{
  i32 < i32;                 // 不报告：相同大小。
  i32 < i64;                 // 报告
  i64 < i64;
  u32 < u32;
  u32 < u64;                 // 报告
  u64 < u64;
  i32 < u64;                 // 报告
  u32 < i64;                 // 报告
  i32 <= i64;                // 报告
  i64 > i32;                 // 报告
  i64 >= i32;                // 报告
  i32 == i64;                // 报告
  u64 != u32;                // 报告
  i32 + i64;                 // 不报告：不是比较运算符。
  ((int64_t)i32) < i64;      // 不报告：显式类型转换。
  u64 < 3;                   // 不报告：与整数文字比较。
  // 问题中的示例＃1。
  size_t n = 0;
  for (unsigned int i = 0; i < n; i++) {}        // 报告
}
// 问题中的示例＃2。
void f(uint64_t n)
{
  for (uint32_t i = 0; i < n; ++i) {             // 报告
  }
}
// 结束

有关clang-query命令的一些详细信息：

该命令传递了-w参数给clang-query以抑制其他警告。这只是因为我编写的测试方式会引发有关未使用值的警告，对于正常代码来说不是必需的。
它传递了set bind-root false以便只报告感兴趣的操作数，而不报告整个表达式。
不幸的是，无法使查询还打印涉及的类型名称。尝试使用绑定来实现这一点会导致clang-query抱怨："Matcher does not support binding."（匹配器不支持绑定）。

这个查询不令人满意的地方在于它明确列出了源类型和目标类型。不幸的是，clang-query没有一个匹配器可以报告任何32位类型，因此它们必须一个一个列出。您可能需要在目标方面添加[unsigned] long long。如果使用面向IL32平台（如Windows）的编译器选项运行此代码，还可能需要删除[unsigned] long。

相关的是，请注意clang-query在--之后接受编译器选项，或者在compile_commands.json文件中指定。不幸的是，clang-query的命令行没有专门的文档，甚至它的--help也没有提及--命令行选项。我可以提供的最佳链接是libtooling的文档，因为clang-query在内部使用该库进行命令行处理。

最后，我要注意我没有在实际代码上进行此查询的任何“调整”。它可能会产生很多噪音，需要进一步调整。如果想了解如

英文:

There does not appear to be a warning option built in to gcc or
clang that does what is requested. However, we can use
clang-query
instead.

Below is a clang-query command that will report comparison of
32-bit and 64-bit integers, on the assumption that int is 32 bits and
long is 64 bits. (More about that below.)

#!/bin/sh
PATH=$HOME/opt/clang+llvm-14.0.0-x86_64-linux-gnu-ubuntu-18.04/bin:$PATH
# In this query, the comments are ignored because clang-query (not the
# shell) recognizes and discards them.
query=&#39;m
  binaryOperator(                            # Find a binary operator expression
    anyOf(                                   #  such that any of:
      hasOperatorName(&quot;&lt;&quot;),                  #   is operator &lt;, or
      hasOperatorName(&quot;&lt;=&quot;),                 #   is operator &lt;=, or
      hasOperatorName(&quot;&gt;&quot;),                  #   is operator &gt;, or
      hasOperatorName(&quot;&gt;=&quot;),                 #   is operator &gt;=, or
      hasOperatorName(&quot;==&quot;),                 #   is operator ==, or
      hasOperatorName(&quot;!=&quot;)                  #   is operator !=;
    ),
    hasEitherOperand(                        #  and where either operand
      implicitCastExpr(                      #   is an implicit cast
        has(                                 #    from
          expr(                              #     an expression
            hasType(                         #      whose type
              hasCanonicalType(              #       after resolving typedefs
                anyOf(                       #        is either
                  asString(&quot;int&quot;),           #         int or
                  asString(&quot;unsigned int&quot;)   #         unsigned int,
                )
              )
            ),
            unless(                          #      unless that expression
              integerLiteral()               #       is an integer literal,
            )
          )
        ),
        hasImplicitDestinationType(          #    and to a type
          hasCanonicalType(                  #     that after typedefs
            anyOf(                           #      is either
              asString(&quot;long&quot;),              #       long or
              asString(&quot;unsigned long&quot;)      #       unsigned long.
            )
          )
        )
      ).bind(&quot;operand&quot;)
    )
  )
&#39;
# Run the query on test.c.
clang-query \
  -c=&quot;set bind-root false&quot; \
  -c=&quot;$query&quot; \
  test.c -- -w
# EOF

When run on the following test.c it reports all of the indicated cases:

// test.c
// Demonstrate reporting comparisons of different-size operands.
#include &lt;stddef.h&gt;          // size_t
#include &lt;stdint.h&gt;          // int32_t, etc.
void test(int32_t i32, int64_t i64, uint32_t u32, uint64_t u64)
{
  i32 &lt; i32;                 // Not reported: same sizes.
  i32 &lt; i64;                 // reported
  i64 &lt; i64;
  u32 &lt; u32;
  u32 &lt; u64;                 // reported
  u64 &lt; u64;
  i32 &lt; u64;                 // reported
  u32 &lt; i64;                 // reported
  i32 &lt;= i64;                // reported
  i64 &gt; i32;                 // reported
  i64 &gt;= i32;                // reported
  i32 == i64;                // reported
  u64 != u32;                // reported
  i32 + i64;                 // Not reported: not a comparison operator.
  ((int64_t)i32) &lt; i64;      // Not reported: explicit cast.
  u64 &lt; 3;                   // Not reported: comparison with integer literal.
  // Example #1 in question.
  size_t n = 0;
  for (unsigned int i = 0; i &lt; n; i++) {}        // reported
}
// Example #2 in question.
void f(uint64_t n)
{
  for (uint32_t i = 0; i &lt; n; ++i) {             // reported
  }
}
// EOF

Some details about the clang-query command:

The command passes -w to clang-query to suppress other warnings.
That's just because I wrote the test in a way that provokes warnings
about unused values, and is not necessary with normal code.
It passes set bind-root false so the only reported site is the
operand of interest rather than also reporting the entire expression.
Unfortunately it is not possible to have the query also print the
names of the types involved. Attempting to do so with a binding
causes clang-query to complain, "Matcher does not support binding."

The unsatisfying aspect of the query is it explicitly lists the source
and destination types. Unfortunately, clang-query does not have a
matcher to, say, report any 32-bit type, so they have to be listed
individually. You might want to add [unsigned] long long on the
destination side. You might also need to remove [unsigned] long if running this
code with compiler options that target an IL32 platform like Windows.

Relatedly, note that clang-query accepts compiler options after
the --, or alternatively in a
compile_commands.json
file.
Unfortunately there isn't dedicated documentation of the clang-query
command line, and even its --help does not mention the -- command
line option. The best I can link is the
documentation for libtooling,
as clang-query uses that library internally for command line
processing.

Finally, I'll note that I haven't done any "tuning" of this query on
real code. It is likely to produce a lot of noise, and will need
further tweaking. For a tutorial on how to work with clang-query,
I recommend the blog post
Exploring Clang Tooling Part 2: Examining the Clang AST with clang-query
by Stephen Kelly. There is also the
AST Matcher Reference,
but the documentation there is quite terse.

答案2

得分: 9

这并不直接回答问题（提供一个警告），但您是否考虑一种避免完全避免问题的替代方法？

    size_t n = // ...
    for (typeof(n) i = 0; i &lt; n; i++) // ...

现在无论n是什么类型，因为i始终与n相同类型，您不应该因为i是较小类型或具有较小范围而导致无限循环的问题。

英文:

This does not directly answer the question (provide a warning), but would you consider an alternative which avoids the problem entirely?

    size_t n = // ...
    for (typeof(n) i = 0; i &lt; n; i++) // ...

It now doesn't matter what type n is, since i will always be the same type as n, you should never have trouble with infinite loops resulting from i being a smaller type or having a smaller range than n.

答案3

得分: 8

PVS Studio可以发出这样的警告（还有许多其他警告），这里是他们文档中的一个几乎相同的示例：

https://pvs-studio.com/en/docs/warnings/v104/

这是一个付费工具，但他们会为开源项目提供免费许可证。

我在Clang-tidy中没有找到这样的警告，这是LLVM项目的一个免费的代码检查工具，但很容易添加一个检查不同大小的整数比较的功能（Scott McPeak稍后回复中提供了出色的clang-query，大部分工作已经完成 - 剩下的部分只是将此查询连接到clang-tidy）。这将是一个非常吵闹的检查。可以通过将检查限制在循环条件上来减少噪音，这也可以使用Clang-tidy来完成，但需要更多的AST匹配工作。

英文:

PVS Studio can issue such warning (and many more), here is almost identical example from their docs:

https://pvs-studio.com/en/docs/warnings/v104/

It is a paid tool, but they give free license to Open Source projects.

I did not find such a warning in Clang-tidy, a free linter tool from LLVM project, but it would be very simple to add a check for comparison of integers of different sizes (a later reply by Scott McPeak with excellent clang-query did most of the work - the remaining part is just plugging this query to clang-tidy). It would be very noisy check though. One can restrict the noise by limiting the check to conditions of loops, that can be done with Clang-tidy too, but a bit more work with AST matchers.

答案4

得分: 5

最新版本的gcc似乎支持-Warith-conversion用于此目的：

-Warith-conversion

当操作数的转换到相同类型不会改变它们的值时，也要警告隐式的算术操作转换。这会影响到-Wconversion、-Wfloat-conversion和-Wsign-conversion的警告。
void f (char c, int i)
{
    c = c + i; // 使用 -Wconversion 会发出警告
    c = c + 1; // 只有使用 -Warith-conversion 会发出警告
}

然而，它对于你的示例不起作用，可能是因为i < n不是一个算术表达式。似乎没有针对通用二进制表达式的这种警告变体。

英文:

Recent versions of gcc seem to support -Warith-conversion for this purpose:

> -Warith-conversion
>
> Do warn about implicit conversions from arithmetic operations even when conversion of the operands to the same type cannot change their values. This affects warnings from -Wconversion, -Wfloat-conversion, and -Wsign-conversion.
>
> void f (char c, int i)
> {
> c = c + i; // warns with -Wconversion
> c = c + 1; // only warns with -Warith-conversion
> }

Yet it does not work for your example, probably because i < n is not an arithmetic expression. There does not seem to be a variant of this warning for generic binary expressions.

答案5

得分: 2

对于C ++，您甚至可以比编译器警告做得更好，假设n是编译时常量。这对于非gcc编译器也有效。但这种逻辑不适用于C代码。

基本思想是将值信息编码到变量类型中，而不是变量值。

template<std::integral T, auto N>
constexpr bool operator<(T value, std::integral_constant<decltype(N), N>)
{
    static_assert(std::is_signed_v<T> == std::is_signed_v<decltype(N)>, "类型不同");
    static_assert((std::numeric_limits<T>::max)() >= N, "类型T的最大值小于N");
    return value < N;
}
// todo: 使用交换的运算符参数类型进行重载
int main()
{
    constexpr std::integral_constant<size_t, 500'000'000> n; // 使用5'000'000'000会导致编译器错误
    for (unsigned int i = 0; i < n; i++)
    {
    }
}

如果该值不是编译时常量，您仍然可以为整数创建包装器模板类型，并重载<运算符以与整数值进行比较，将static_assert添加到此运算符的主体中。

template<std::integral T>
class IntWrapper
{
    T m_value;
public:
    constexpr IntWrapper(T value)
        : m_value(value)
    {}
    template<std::integral U>
    friend constexpr bool operator<(U o1, IntWrapper o2)
    {
        static_assert(std::is_signed_v<U> == std::is_signed_v<T>, "类型具有不同的符号");
        static_assert((std::numeric_limits<U>::max)() >= (std::numeric_limits<T>::max)(),
            "由于涉及的类型的最大值，比较可能永远不会返回false");
        return o1 < o2.m_value;
    }
};
void f(IntWrapper<uint64_t> n)
{
    for (uint32_t i = 0; i < n; ++i) {
    }
}

请注意，更改比较运算符的操作数之一的类型的必要性既可以是优点，也可以是缺点：它需要您修改代码，但也允许您基于每个变量应用检查。

英文:

For C++ you may be able to do even better than a compiler warning, assuming n is a compile time constant. This also works for non-gcc compilers. This logic is not available for C code though.

The idea is basically encoding the value information in the variable type instead of the variable value.

template&lt;std::integral T, auto N&gt;
constexpr bool operator&lt;(T value, std::integral_constant&lt;decltype(N), N&gt;)
{
    static_assert(std::is_signed_v&lt;T&gt; == std::is_signed_v&lt;decltype(N)&gt;, &quot;the types have different signs&quot;);
    static_assert((std::numeric_limits&lt;T&gt;::max)() &gt;= N, &quot;the maximum of type T is smaller than N&quot;);
    return value &lt; N;
}
// todo: overload with swapped operator parameter types
int main()
{
    constexpr std::integral_constant&lt;size_t, 500&#39;000&#39;000&gt; n; // go with 5&#39;000&#39;000&#39;000, and you&#39;ll get a compiler error 
    for (unsigned int i = 0; i &lt; n; i++)
    {
    }
}

If the value is not a compile time constant, you could still create a wrapper template type for the integer and overload the < operator for comparisons with integral values, adding the static_asserts into the body of this operator.

template&lt;std::integral T&gt;
class IntWrapper
{
    T m_value;
public:
    constexpr IntWrapper(T value)
        : m_value(value)
    {}
    template&lt;std::integral U&gt;
    friend constexpr bool operator&lt;(U o1, IntWrapper o2)
    {
        static_assert(std::is_signed_v&lt;U&gt; == std::is_signed_v&lt;T&gt;, &quot;types have different signedness&quot;);
        static_assert((std::numeric_limits&lt;U&gt;::max)() &gt;= (std::numeric_limits&lt;T&gt;::max)(),
            &quot;the comparison may never yield false because of the maxima of the types involved&quot;);
        return o1 &lt; o2.m_value;
    }
};
void f(IntWrapper&lt;uint64_t&gt; n)
{
    for (uint32_t i = 0; i &lt; n; ++i) {
    }
}

Note that the necessity of changing the type for one of the operands of the comparison operator can be both a benefit and a drawback: it requires you to modify the code, but it also allows you to apply check on a per-variable basis...

答案6

得分: 2

遵循一个可以在源代码级别停止的编码标准

大多数嵌入式软件的编码标准禁止使用"int"，原因正如你所说的那样。C标准要求存在固定长度的等效数据类型（8、16和32位已经普遍可用多年；64位较新，但在基本上所有地方仍有支持）。在任何地方使用它们都是良好的做法，但在与安全相关的软件中，通常是强制性的。许多工具适用于流行的编码标准，如MISRA-C，它们将为您捕获这些问题。

你的示例显然对许多上面的评论者来说似乎有些难以理解，因为它需要2^32次迭代才会溢出。许多现代编程人员忽视的一点是，int也可以是16位的，这在过去使溢出变得更容易。阿丽亚娜-5灾难是因为将64位值截断为16位而溢出而引起的。Therac-25灾难也部分是由整数溢出导致的，它清除了错误状态。

32位中也存在类似的示例。Windows 95和98在运行49.7天后因为32位毫秒计时器溢出而著名崩溃。而在15年后，我们将迎来Y2038问题。大多数现代系统都已准备好应对Y2038问题，但很可能我们会在那时遇到一些意外情况！

所有这些都构成了软件工程作为一门工程学科的机构历史的一部分，就像泰桥和塔科马纳罗斯桥构成了土木工程机构历史的一部分一样。我有点震惊看到上面有人说他们40年的C编码中从未听说过这个问题。当然，可能只是一个编码人员并且不知道这一点，就像可能只是一个建筑工人并且不知道土木工程原理一样。然而，一名工程师必须了解他们设计背后的更深层原理。尽管这个问题可能看似微不足道，但它清楚地区分了软件工程师和纯粹的编码人员之间的专业水平。鉴于阿丽亚娜-5和Therac-25的事件，我对此持有批判态度，我不为此道歉。

英文:

Follow a coding standard which stops it at source

Most coding standards for embedded software prohibit the use of "int", for exactly the reasons you say. The C standard requires the existence of equivalent data types of fixed lengths (8, 16 and 32 bits have been universally available for years; 64-bit is newer but still supported basically everywhere). It is good practise to use them everywhere, but in safety-related software it is normally mandatory. Many tools exist for popular coding standards such as MISRA-C which will catch these issues for you.

Your example apparently seems obscure to many commenters above, because it needs 2^32 iterations to overflow. What many modern coders forget is that int can also be 16-bit, and this made overflows much easier in the past. The Ariane-5 disaster was caused by a 64-bit value overflowing when truncated to 16-bit. The Therac-25 disaster was also partly caused by an integer overflow which cleared the error status.

Examples do exist in 32-bit too, though. Windows 95 and 98 famously crashed after 49.7 days caused by overflow of a 32-bit millisecond timer. And in 15 years time we have the Y2038 problem to look forward to. Most modern systems are Y2038-ready, but it's likely we'll get there and have some surprises!

All of these form part of the institutional history of software engineering as an engineering discipline, in the same way as the Tay Bridge and Tacoma Narrows Bridge form part of the institutional history of civil engineering. I'm somewhat shocked to see someone above saying they'd never heard of this in 40 years of C coding. It is certainly possible to be just a coder and be unaware of this, just as it is possible to be just a builder and be unaware of civil engineering principles. An engineer must be aware of the deeper principles behind their designs though. This issue, as trivial as it may seem, is one thing which clearly differentiates expertise levels between software engineers and mere coders. Given Ariane-5 and Therac-25, I make no apology for being judgemental about this.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在C和C++中比较不同大小的无符号整数时获得警告？

问题

编辑：上游编译器功能请求

Edit: Upstream compiler feature requests

答案1

答案2

答案3

答案4

答案5

答案6

Follow a coding standard which stops it at source

比较 size_t 和 -1

Running C++ dll in C# for calling functions

C++ 用类的实例替换枚举。避免堆分配。

使用Stack>合并重叠区间

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。