英文:
gdb step instruction won't go through `gettimeofday`
问题
在尝试按照此Stack Overflow回答建议的方式分解程序的每个指令时,我发现gdb
在步进到gettimeofday
指令后永远无法完成处理。这是一个最小的可重现示例:
在main.c
中:
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
int main()
{
struct timeval tv;
gettimeofday(&tv, NULL);
return 0;
}
使用gcc main.c
进行编译
运行:gdb -silent a.out
读取符号从a.out...
(gdb) set height 0
(gdb) b main
断点 1 位于 0x1175: 文件 main.c, 行 7.
(gdb) set logging file ./log.txt
(gdb) set logging redirect on
(gdb) set logging on
重定向输出到 ./log.txt。
复制调试输出到 ./log.txt。
(gdb) r
(gdb) while 1
>si
>end
log.txt
显示:
程序的起始位置:/home/user/tst-gdb-step/a.out
断点 1,main () 在 main.c:7
7 {
0x000055555555517e 7 {
0x0000555555555182 7 {
10 gettimeofday(&tv, NULL);
0x0000555555555188 10 gettimeofday(&tv, NULL);
0x000055555555518d 10 gettimeofday(&tv, NULL);
0x0000555555555190 10 gettimeofday(&tv, NULL);
0x0000555555555070 in gettimeofday@plt ()
...
在3分钟的执行后,gdb
向log.txt
写入将近500,000行(而且没有任何停止的迹象),这显然不是正常情况,因为vdso
应该很快。日志还显示了一个无限循环。
但是,如果使用n
而不是si
,程序可以正常退出。
我正在使用的工具:
$ uname -a
Linux 5.15.0-69-generic #76~20.04.1-Ubuntu SMP Mon Mar 20 15:54:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
$ gdb --version
GNU gdb (Ubuntu 10.2-0ubuntu1~20.04~1) 10.2
我想知道为什么会发生这种情况。谢谢!
英文:
When trying to disassemble every instruction of a program as this SO answer suggests, I found out that gdb
will never finish processing after stepping instruction into gettimeofday
. Here is a minimal reproducible example:
In main.c
:
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
int main()
{
struct timeval tv;
gettimeofday(&tv, NULL);
return 0;
}
Compile with gcc main.c
To run: gdb -silent a.out
Reading symbols from a.out...
(gdb) set height 0
(gdb) b main
Breakpoint 1 at 0x1175: file main.c, line 7.
(gdb) set logging file ./log.txt
(gdb) set logging redirect on
(gdb) set logging on
Redirecting output to ./log.txt.
Copying debug output to ./log.txt.
(gdb) r
(gdb) while 1
>si
>end
And log.txt
shows:
Starting program: /home/user/tst-gdb-step/a.out
Breakpoint 1, main () at main.c:7
7 {
0x000055555555517e 7 {
0x0000555555555182 7 {
10 gettimeofday(&tv, NULL);
0x0000555555555188 10 gettimeofday(&tv, NULL);
0x000055555555518d 10 gettimeofday(&tv, NULL);
0x0000555555555190 10 gettimeofday(&tv, NULL);
0x0000555555555070 in gettimeofday@plt ()
0x0000555555555074 in gettimeofday@plt ()
0x00007ffff7fcd690 in gettimeofday ()
0x00007ffff7fcd691 in gettimeofday ()
0x00007ffff7fcd698 in gettimeofday ()
0x00007ffff7fcd69b in gettimeofday ()
0x00007ffff7fcd69d in gettimeofday ()
0x00007ffff7fcd69e in gettimeofday ()
0x00007ffff7fcd6a1 in gettimeofday ()
0x00007ffff7fcd6a7 in gettimeofday ()
0x00007ffff7fcd6aa in gettimeofday ()
0x00007ffff7fcd6ae in gettimeofday ()
0x00007ffff7fcd6b4 in gettimeofday ()
0x00007ffff7fcd6ba in gettimeofday ()
0x00007ffff7fcd6bd in gettimeofday ()
0x00007ffff7fcd6c3 in gettimeofday ()
0x00007ffff7fcd6c6 in gettimeofday ()
0x00007ffff7fcd6c8 in gettimeofday ()
0x00007ffff7fcd6cc in gettimeofday ()
0x00007ffff7fcd6cf in gettimeofday ()
0x00007ffff7fcd6d2 in gettimeofday ()
0x00007ffff7fcd6d8 in gettimeofday ()
0x00007ffff7fcd6df in gettimeofday ()
0x00007ffff7fcd6e6 in gettimeofday ()
0x00007ffff7fcd6ed in gettimeofday ()
0x00007ffff7fcd6f0 in gettimeofday ()
0x00007ffff7fcd6f2 in gettimeofday ()
0x00007ffff7fcd6f5 in gettimeofday ()
0x00007ffff7fcd6f9 in gettimeofday ()
0x00007ffff7fcd6fc in gettimeofday ()
0x00007ffff7fcd702 in gettimeofday ()
0x00007ffff7fcd709 in gettimeofday ()
0x00007ffff7fcd70c in gettimeofday ()
0x00007ffff7fcd70f in gettimeofday ()
0x00007ffff7fcd6a7 in gettimeofday ()
0x00007ffff7fcd6aa in gettimeofday ()
0x00007ffff7fcd6ae in gettimeofday ()
0x00007ffff7fcd6b4 in gettimeofday ()
0x00007ffff7fcd6ba in gettimeofday ()
0x00007ffff7fcd6bd in gettimeofday ()
0x00007ffff7fcd6c3 in gettimeofday ()
0x00007ffff7fcd6c6 in gettimeofday ()
0x00007ffff7fcd6c8 in gettimeofday ()
0x00007ffff7fcd6cc in gettimeofday ()
0x00007ffff7fcd6cf in gettimeofday ()
0x00007ffff7fcd6d2 in gettimeofday ()
0x00007ffff7fcd6d8 in gettimeofday ()
0x00007ffff7fcd6df in gettimeofday ()
...
After 3 whole minutes of exeuction, gdb
writes nearly 500,000 lines to log.txt
(and without any sign to stop), which is obviously not the normal case, since vdso
is meant to be fast. The log also shows an infinite loop.
But if n
instead of si
is used, the program can exit without any problem.
Tools I'm using:
$ uname -a
Linux 5.15.0-69-generic #76~20.04.1-Ubuntu SMP Mon Mar 20 15:54:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
$ gdb --version
GNU gdb (Ubuntu 10.2-0ubuntu1~20.04~1) 10.2
I'm wondering why does this happen. Thanks!
答案1
得分: 3
这只是一个基于源代码深入研究的猜测:我认为单步调试gettimeofday
函数内部的过程减慢了这些过程的速度,以至于它们陷入无尽循环,试图满足其内部对结果精度的要求。
gettimeofday
在x86-64/Linux上的代码位于linux/lib/vdso/gettimeofday.c
中。我所说的“过程”指的是do_hres
函数。请注意此函数中的循环。在循环内部,它基本上计算时间为(由内核维护的粗略计数器)+(经常溢出的硬件高精度时钟)。每次内核更新粗略计数器时,它也会更新循环条件中读取的“序列号”。设计者预期此循环最多循环两次——关键是,如果你恰好在错误时刻进入循环,并对粗略计数器的数据读取不一致,从而计算出荒谬的时间,你会重试,并第二次计算正确。
但是,逐条指令地逐步执行这段代码,即使是以自动化方式,也会使其变得非常缓慢,以至于内核在循环体执行时总是更改粗略计数器和序列号,因此你会陷入循环。
(不要低估逐条执行机器代码指令的速度有多慢,与正常执行相比。你要添加两次上下文切换的时间成本以及GDB的不特别快速的脚本语言中的几个命令的时间成本到每条指令上。我不会感到惊讶,如果上下文切换和相关的高速缓存和TLB抖动足以自行触发这种活锁。)
英文:
This is a guess based on source diving: I believe single stepping through the guts of gettimeofday
is slowing those guts down so much that they get stuck in an endless loop trying and failing to meet their internal accuracy requirement for the result.
The code for gettimeofday
on x86-64/Linux is in linux/lib/vdso/gettimeofday.c
. The "guts" I'm talking about are the do_hres
function. (I've linked to kernel 5.15.0 because that's what you have, but this file doesn't change often.) Notice the loop in this function. Inside the loop, it's calculating the time as, essentially, (coarse counter maintained by kernel) + (hardware high-precision clock that overflows often). Every time the kernel updates the coarse counter, it also updates the "sequence number" being read in the loop condition. The designer expected this loop to cycle at most twice -- the point is that if you enter the loop at just the wrong moment and read inconsistent data about the coarse counter, thus computing a nonsense time, you'll retry and get it right the second time around.
But stepping through this code instruction by instruction, even in an automated way, makes it so slow that the kernel always changes the coarse counter and the sequence number while the loop body is executing, and so you get stuck in the loop.
(Do not underestimate just how slow it is to step through machine code instruction by instruction, compared to normal execution. You're adding the time cost of two context switches and several commands in GDB's not particularly speedy scripting language to every instruction. I wouldn't be surprised if the context switches and attendant cache and TLB thrashing were enough to trigger this livelock all by themselves.)
答案2
得分: 1
以下是我用于规避这个问题的gdb脚本:
# 载入程序二进制文件、符号表等,并启动它,需要gdb > 8.1
starti
# 打印每条指令的位置
display/i $pc
# 告诉gdb输出时不考虑窗口高度
set height 0
while 1
# 如果函数运行导致问题,则运行到完成
if ($pc == gettimeofday)
fin
end
if ($pc == clock_gettime)
fin
end
# 可选:
# 较新的x86版本的 `memset` 使用 `rep` 指令,
# 这将扩展为很多行。所以也忽略它。
if ($pc == memset)
fin
end
# 最重要的是,按默认步进指令
si
end
对于所有情况,这并不总是起作用(例如,基于时间的事件分支),而且显然很慢,但对我来说能完成任务。
英文:
Here is the gdb script I use to bypass this problem:
# load the program binary, symbol table, etc., and start it, requiring gdb > 8.1
starti
# print location for every instruction
display/i $pc
# tell gdb to output without caring window height
set height 0
while 1
# run the functions to completion if they cause problems
if ($pc == gettimeofday)
fin
end
if ($pc == clock_gettime)
fin
end
# optional:
# newer x86 version of `memset` uses `rep` instructions
# which will expand to a whole lot of lines. So ignore it too.
if ($pc == memset)
fin
end
# most importantly, step instruction on default
si
end
This does not work correctly for all cases (e.g., branching on time-related events) and is obviously slow, but it does the job for me.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论