英文:
How to hook malloc function on Linux?
问题
I'm reading the Driving Compilers series of articles by Fabien Sanglard. In part 3 about The Compiler there is an example for hooking malloc function. First, an incorrect solution that falls in an infinite recursion is shown:
void* malloc(size_t sz) {
void *(*libc_malloc)(size_t) = dlsym(RTLD_NEXT, "malloc");
printf("malloced %zu bytes\n", sz);
return libc_malloc(sz);
}
The reason for the recursion is that dlsym function internally calls malloc.
After that, a supposedly fixed solution is provided:
#include <stdio.h>
#include <dlfcn.h>
static void* (*real_malloc)(size_t) = nullptr;
void *malloc(size_t size) {
if(!real_malloc) {
real_malloc = dlsym(RTLD_NEXT, "malloc");
}
printf("malloc(%d) = ", size);
return real_malloc(size);
}
Except that the solution is not really fixed because it suffers from the same problem. If I rename the function from malloc to my_malloc
for example then it will work, but it is no longer a hook because the other software uses malloc
but not my_malloc
. Is there any solution to the problem?
英文:
I'm reading the Driving Compilers series of articles by Fabien Sanglard. In part 3 about The Compiler there is an example for hooking malloc function. First, an incorrect solution that falls in an infinite recursion is shown:
void* malloc(size_t sz) {
void *(*libc_malloc)(size_t) = dlsym(RTLD_NEXT, "malloc");
printf("malloced %zu bytes\n", sz);
return libc_malloc(sz);
}
The reason for the recursion is that dlsym function internally calls malloc.
After that, a supposedly fixed solution is provided:
#include <stdio.h>
#include <dlfcn.h>
static void* (*real_malloc)(size_t) = nullptr;
void *malloc(size_t size) {
if(!real_malloc) {
real_malloc = dlsym(RTLD_NEXT, "malloc");
}
printf("malloc(%d) = ", size);
return real_malloc(size);
}
Except that the solution is not really fixed because it suffers from the same problem. If I rename the function from malloc to my_malloc
for example then it will work, but it is no longer a hook because the other software uses malloc
but not my_malloc
. Is there any solution to the problem?
答案1
得分: 2
以下是您提供的代码的中文翻译:
首先,根据C11标准(草案)的7.1.4节 使用库函数,第4段:
> 标准库中的函数不能保证是可重入的,并且可能会修改具有静态或线程存储期的对象。
因此,您不能安全地使用C标准中的任何函数来保证安全性。您需要依赖于特定于平台的解决方案。
首先,您可以查找您的平台允许在信号处理程序内部调用哪些函数 - 这些函数基本上必须是可重入的。
对于基于POSIX的系统,我假设您正在使用这种系统,因为您使用了POSIX函数 dlsym()
,您可以从2.4 信号概念开始查看一个详尽的列表。
请注意,write()
函数在异步信号安全函数的列表中,但 printf()
不在其中。
因此,您的代码:
static void* (*real_malloc)(size_t) = nullptr;
void *malloc(size_t size) {
if(!real_malloc) {
real_malloc = dlsym(RTLD_NEXT, "malloc");
}
printf("malloc(%d) = ", size);
return real_malloc(size);
}
可以替换为:
static void* (*real_malloc)(size_t) = nullptr;
void *malloc(size_t size) {
if(!real_malloc) {
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock( &mutex );
if(!real_malloc) {
real_malloc = dlsym(RTLD_NEXT, "malloc");
}
pthread_mutex_unlock( &mutex );
}
write( STDOUT_FILENO, "malloc()", strlen( "malloc()" );
return real_malloc(size);
}
请注意,我省略了地址 - 那需要转换为一个字符串。这并不难做,可以轻松找到如何做的示例,比如在回答如何在C中将整数转换为字符串?的示例中。请注意,您不能使用那里使用不是异步信号安全的标准库函数的任何答案。
如果您在Solaris上运行,s[n]printf()
实际上在那里是异步信号安全的。
我还为多线程使用添加了一些保护 - 在获取到实际 malloc()
指针值时存在竞态条件,应该受到保护,如果只是因为如果指针值被损坏,您可能永远无法复现导致的任何错误。
编辑
根据@ChrisDodd的评论,修复了对dlsym()
安全性的担忧:
static void* (*real_malloc)(size_t) = NULL;
__attribute__((constructor))
static void initValues(void) {
real_malloc = dlsym(RTLD_NEXT, "malloc");
}
void *malloc(size_t size) {
write( STDOUT_FILENO, "malloc()", strlen( "malloc()" );
return real_malloc(size);
}
请注意,malloc()
替换中的代码现在简单得多 - 不可能存在竞态条件。
英文:
First, per 7.1.4 Use of library functions, paragraph 4 of the (draft) C11 standard:
> The functions in the standard library are not guaranteed to be reentrant and may modify objects with static or thread storage duration.
So you can't safely use any function from the C standard and be safe. You need to rely on platform-specific solutions.
First, you can find what functions your platform allows to be called from within signal handlers - these functions pretty much have to be reentrant.
For POSIX-based systems, which I'm assuming you're using because you use the POSIX function dlsym()
, you can start with 2.4 Signal Concepts which has an extensive list.
Note that write()
is on the list of async-signal-safe functions, but printf()
is not.
So your code
static void* (*real_malloc)(size_t) = nullptr;
void *malloc(size_t size) {
if(!real_malloc) {
real_malloc = dlsym(RTLD_NEXT, "malloc");
}
printf("malloc(%d) = ", size);
return real_malloc(size);
}
can be replaced with
static void* (*real_malloc)(size_t) = nullptr;
void *malloc(size_t size) {
if(!real_malloc) {
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock( &mutex );
if(!real_malloc) {
real_malloc = dlsym(RTLD_NEXT, "malloc");
}
pthread_mutex_unlock( &mutex );
}
write( STDOUT_FILENO, "malloc()", strlen( "malloc()" );
return real_malloc(size);
}
Note that I omitted the address - that would need to be converted to a string. It's not hard to do, and examples on how to do that can easily be found, such as in the answers to How can I convert an int to a string in C?. Note that you can't use any answers there that use a standard library function that isn't async-signal-safe.
And if you're running on Solaris, s[n]printf()
actually is async-signal-safe there.
I also added some protection for multithreaded use - there's a race condition in obtaining the pointer value to the actual malloc()
that should be protected against, if only because if the pointer value gets corrupted you will likely never be able to reproduce whatever error(s) that causes.
EDIT
Per the comment from @ChrisDodd, fixed to address concerns about safety of dlsym()
:
static void* (*real_malloc)(size_t) = NULL;
__attribute__((constructor))
static void initValues(void) {
real_malloc = dlsym(RTLD_NEXT, "malloc");
}
void *malloc(size_t size) {
write( STDOUT_FILENO, "malloc()", strlen( "malloc()" );
return real_malloc(size);
}
Note that the code inside the malloc()
replacement is now much simpler - there's no race condition possible.
答案2
得分: 1
以下是您要翻译的内容:
更深入的调查显示,实际调用 `malloc` 并导致无限递归的是 `printf` 函数调用,而不是文章中所写的 `dlsym`。这将问题缩小为如何以不发生这种情况的方式进行打印。我提出了以下解决方案:
#include <stdio.h>
#include <dlfcn.h>
static void* (*real_malloc)(size_t) = NULL;
void* malloc(size_t size) {
if(!real_malloc) {
real_malloc = dlsym(RTLD_NEXT, "malloc");
}
static char isPrintF = 0;
if (isPrintF) {
return real_malloc(size);
}
char* p = real_malloc(size);
isPrintF = 1;
printf("malloc(%zu) = %p\n", size, p);
isPrintF = 0;
return p;
}
英文:
A little more investigation shows that what really calls malloc
and leads to an infinite recursion is the printf
function call, but not the dlsym
as written in the article. This reduces the question to how to do the printing in such a way that this not happens. I came up with the following solution:
#include <stdio.h>
#include <dlfcn.h>
static void* (*real_malloc)(size_t) = NULL;
void* malloc(size_t size) {
if(!real_malloc) {
real_malloc = dlsym(RTLD_NEXT, "malloc");
}
static char isPrintF = 0;
if (isPrintF) {
return real_malloc(size);
}
char* p = real_malloc(size);
isPrintF = 1;
printf("malloc(%zu) = %p\n", size, p);
isPrintF = 0;
return p;
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论