Windows Loader操作DLL文件

huangapple go评论81阅读模式
英文:

Windows Loader operation on DLLs

问题

我正在阅读约翰·莱文的《连接器与装载器》一书中的这一段,描述了在Windows中加载dll的过程:

“对于每个导入的DLL,都有一个导入地址数组,通常位于程序的文本段中,程序加载器将解析后的地址放入其中。”

这让我感到惊讶的有两个方面:

  1. 文本段不是只读/执行吗?(也许加载器只在写入后才更改其权限?)
  2. 这难道不会使文本段在进程之间变得不可共享吗?Linux会付出很大努力,以确保所有库的所有实例的文本段都相同(“位置独立代码”)。这在Windows中难道不是一个目标吗?
英文:

I'm reading this paragraph in John Levine's "Linkers and Loaders" book, describing the process of loading a dll in windows:

> For each imported DLL, there is an array of import addresses, typically in the program’s text segment, into which the program loader places the resolved addresses.

This surprises me in two ways:

  1. Isn't the text segment read/execute only? (Perhaps the loader changes its permissions only after writing to it?)
  2. doesn't this make the text segment non-shareable between processes? Linux goes to great lengths to keep the text segment the same for all instances of a library ("Position Independent Code"). Is that somehow not a goal for Windows?

答案1

得分: 3

被导入地址的数组由PE文件的可选标头中的16个数据目录之一标识,通常称为Import Address Table(IAT)。它通常位于其自己的部分**.idata中,但一些链接器可能将IAT与IMPORT Table.text**部分捆绑在一起。

  1. 是的,.text代码段只能读取和执行,但访问权限仅在Windows加载程序解析所有导入函数的地址并将它们存储到IAT后才建立。

  2. Windows动态链接库(DLL)的每个实例仅加载到内存一次。当运行相同PE程序的多个实例时,所有所需的DLL都映射到相同的虚拟地址,因此它们的IAT内容在运行时是相同的。然而,大多数DLL都有不同的部分**.text.idata**,因此导入函数的映射地址可以被随机化(ASLR)。
    Microsoft似乎不太关心PIC,每个DLL都可以加载到任意虚拟地址,并在需要时重定位,使用它们的**.reloc**部分。

英文:

The array of imported addresses is identified by one of 16 data directories in the optional header of PE file, namely as Import Address Table (IAT). It is usualy located in its own section .idata but some linkers may bundle IAT with IMPORT Table or with the .text section.

  1. Yes, .text code segment is readable and executable only, but the access privileges are established only after the Windows loader has resolved all addresses of imported functions and stored them to IAT.
  2. Every instance of Windows dynamic library (DLL) is loaded to memory just once. When multiple instances of the same PE program are run, all the required DLLs are mapped to identical virtual address, so the contents of their IAT is identical at run time. Nevertheless, most of DLL have distinc sections .text and .idata, thus the mapped addresses of imported functions can be randomized (ASLR).
    Microsoft doesn't seem to care of PIC much, each DLL can be loaded to arbitrary virtual address and relocated whenever needed, using their .reloc section.

答案2

得分: 2

以下是翻译好的部分:

  1. 对于每个导入的DLL,都有一个导入地址数组,通常位于程序的文本段中,程序加载器会将已解析的地址放入其中。

  2. 过去流行将导入项放在可读/可写的.idata部分,然后变成了只读的.idata。如今(对于64位可执行文件和DLL),导入通常是(read-only) .rdata的一部分。你也可以将导入放在.text中,但我记得那不太受欢迎,也许是在我之前的时代。

  3. 顺便说一下,你可以编写一些任意的部分名称并将导入项放在那里,名称的确不重要。

  4. 或许加载程序只在写入后更改权限?

  5. 是的,它可以轻松地做到这一点,但实际情况更糟:延迟加载的DLL呢?因此,如果存在延迟加载的DLL,导入项所在的页面在执行过程中的各个时刻可能会短暂地变为可写,然后修改,然后再次变为只读。

  6. 作为一个有趣的附带事实,在x86内核代码中,可以直接写入只读页面,除非它专门启用了CR0.WP。但加载程序在用户模式下运行,所以在这里不重要。

  7. 这难道不会使文本段在进程之间不可共享吗?

  8. 如果.text中有导入项,那只影响实际包含导入项的页面,而不是整个部分。导入项不是通过更改代码中各种call指令的目标地址来处理的,而是通过填充一个中心地址数组,然后由间接调用指令使用。因此,代码本身不会改变,只有某个地方的密集数组。如果该数组在.text中,那么.text的那一部分就不可共享,但其余部分仍然可以共享。

  9. 但撇开.text中的导入项的情况,那些影响文本部分的重定位是否会使文本部分不可共享?32位代码可能不常常在.text中有导入项,但通常会有.text的重定位,这似乎更糟:这通常会影响大多数页面。

  10. 但即使如此,代码页面最终仍然可以共享大部分。需要一些技巧,尝试在多个进程中以一致的地址加载DLL,以便它们的代码页面可以共享,尽管它们被重定位了。由于页面已经被修改,因此不能简单地进行内存映射。

  11. 由于RIP相对寻址,x64代码在很大程度上是自然的位置无关的。

  12. 顺便说一下,允许(尽管不建议)存在一个不可重定位的可执行文件或DLL。不可重定位意味着它们只能加载到其“首选”(在这种情况下是“必需的”)基地址,并且与ASLR不兼容。可执行文件首先加载到其地址空间中,所以这个过程很顺利。一个不可重定位的DLL要么获取其首选地址,要么不获取,如果不获取,那么加载将失败。

英文:

> For each imported DLL, there is an array of import addresses, typically in the program’s text segment, into which the program loader places the resolved addresses.

It used to be popular to put imports in read/write .idata section, then that changed to read-only .idata, nowadays (for 64bit exes and dlls) the imports are usually parts of (read-only) .rdata. You could put the imports in .text too but I don't remember that being popular, perhaps before my time.

By the way you can make up some arbitrary section name and put the imports there, doesn't matter, the actual name of the section is not important.

> Perhaps the loader changes its permissions only after writing to it?

Yes, and it can easily do that, but it's actually worse: what about delay-loaded DLLs? Therefore if there are delay-loaded DLLs, whatever pages the imports are in, may be briefly changed to writable, modified, and then turned back to read-only, at various points during execution, whenever delay-loaded DLLs are loaded.

As a fun side fact, on x86 kernel code can write to read-only pages directly, unless it specifically enables CR0.WP. But the loader runs in user mode, so that doesn't end up mattering here.

> Doesn't this make the text segment non-shareable between processes?

If there are imports in .text, then that only affects the pages that actually have imports in them, not the whole section. Imports are not handled by changing the target addresses of various call instructions throughout the code, but by filling a central array of addresses that are then used by indirect-call instructions. So the code itself doesn't change, only a dense array somewhere. If that array is in .text, then that part of .text wouldn't be sharable, but the rest still would be.

But putting aside the case of imports in .text, what about relocations that affect the text section, do they make the text section non-sharable? 32-bit code may not commonly have imports in .text, it does commonly have relocations for .text, which seems worse: that does tend to affect most pages.

But even so, code pages end up being mostly sharable anyway. Some trickery is necessary: an attempt is made to load DLLs at consistent addresses across multiple processes, so that their code pages can be shared even though they are relocated. Since the pages were modified though, they cannot be simply memory-mapped.

x64 code is, thanks to RIP-relative addressing, for the most part naturally position-independent.

By the way it is allowed (though not recommended) to have a non-relocatable exe or dll. Being non-relocatable means they can only be loaded at their "preferred" (well, "required" in this case) base address, and are incompatible with ASLR. The exe is loaded into its address space first, so that just works. A non-relocatable dll either gets its preferred address or not, and if not, then it fails to load.

huangapple
  • 本文由 发表于 2023年6月26日 23:18:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76558050.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定