2023年5月17日 16:49:12go评论64阅读模式

英文:

memcpy on AARCH64 yielding unaligned Data Abort Exception, ARM GNU Toolchain or newlibc Bug?

问题

我理解你的问题。你的问题似乎涉及到在ARM64架构上使用memcpy函数时可能导致未对齐访问异常的情况。你还提到在裸机项目中很难确保每次memcpy调用都具有64位对齐的大小。

为了提供临时修复方案，你可以尝试自定义memcpy函数的实现，确保它对32位对齐的大小使用32位寄存器。你可以基于标准的memcpy实现进行修改，以适应你的需求。

关于在哪里找到替代memcpy的实现，你可以考虑查看GNU C库（glibc）的源代码或其他与你的工具链兼容的C库。然后，你可以根据需要进行修改，以确保对32位对齐的数据使用32位寄存器。

请注意，这只是一个可能的解决方案，具体取决于你的项目和工具链的要求。你可能需要谨慎测试和验证这些更改，以确保不会引入其他问题。希望这可以帮助你解决问题。

英文:

I've been using the ARM GCC release aarch64-none-elf-gcc-11.2.1 in a baremetal project for some time in a large project that has successfully used libc functions (malloc/memcpy) many times without issue using these options:

-L$AARCH64_GCC_PATH/aarch64-none-elf/lib -lc -lnosys -lg

I recently saw an exception due to an unaligned access during memcpy despite compiling with -mstrict-align.

After isolating the issue and creating a unit test I believe I've found a bug, please ignore the addresses from the objdump and memcpy call, just made them up for this test.

//unit test
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;
volatile int bssTest;

void swap(int a, int b) {
    memcpy((void*)0x500,(void*)0x1000,0xc);
}

0000000000060040 &lt;memcpy&gt;:
   60040:	f9800020 	prfm	pldl1keep, [x1]
   60044:	8b020024 	add	x4, x1, x2
   60048:	8b020005 	add	x5, x0, x2
   6004c:	f100405f 	cmp	x2, #0x10
   60050:	54000209 	b.ls	60090 &lt;memcpy+0x50&gt;  // b.plast
   60054:	f101805f 	cmp	x2, #0x60
   60058:	54000648 	b.hi	60120 &lt;memcpy+0xe0&gt;  // b.pmore
   6005c:	d1000449 	sub	x9, x2, #0x1
   60060:	a9401c26 	ldp	x6, x7, [x1]
   60064:	37300469 	tbnz	w9, #6, 600f0 &lt;memcpy+0xb0&gt;
   60068:	a97f348c 	ldp	x12, x13, [x4, #-16]
   6006c:	362800a9 	tbz	w9, #5, 60080 &lt;memcpy+0x40&gt;
   60070:	a9412428 	ldp	x8, x9, [x1, #16]
   60074:	a97e2c8a 	ldp	x10, x11, [x4, #-32]
   60078:	a9012408 	stp	x8, x9, [x0, #16]
   6007c:	a93e2caa 	stp	x10, x11, [x5, #-32]
   60080:	a9001c06 	stp	x6, x7, [x0]
   60084:	a93f34ac 	stp	x12, x13, [x5, #-16]
   60088:	d65f03c0 	ret
   6008c:	d503201f 	nop
   60090:	f100205f 	cmp	x2, #0x8
   60094:	540000e3 	b.cc	600b0 &lt;memcpy+0x70&gt;  // b.lo, b.ul, b.last
   60098:	f9400026 	ldr	x6, [x1]
   6009c:	f85f8087 	ldur	x7, [x4, #-8]
   600a0:	f9000006 	str	x6, [x0]
   600a4:	f81f80a7 	stur	x7, [x5, #-8]
   600a8:	d65f03c0 	ret
   600ac:	d503201f 	nop
   600b0:	361000c2 	tbz	w2, #2, 600c8 &lt;memcpy+0x88&gt;
   600b4:	b9400026 	ldr	w6, [x1]
   600b8:	b85fc087 	ldur	w7, [x4, #-4]
   600bc:	b9000006 	str	w6, [x0]
   600c0:	b81fc0a7 	stur	w7, [x5, #-4]
   600c4:	d65f03c0 	ret
   600c8:	b4000102 	cbz	x2, 600e8 &lt;memcpy+0xa8&gt;
   600cc:	d341fc49 	lsr	x9, x2, #1
   600d0:	39400026 	ldrb	w6, [x1]
   600d4:	385ff087 	ldurb	w7, [x4, #-1]
   600d8:	38696828 	ldrb	w8, [x1, x9]
   600dc:	39000006 	strb	w6, [x0]
   600e0:	38296808 	strb	w8, [x0, x9]
   600e4:	381ff0a7 	sturb	w7, [x5, #-1]
   600e8:	d65f03c0 	ret
   600ec:	d503201f 	nop
   600f0:	a9412428 	ldp	x8, x9, [x1, #16]
   600f4:	a9422c2a 	ldp	x10, x11, [x1, #32]
   600f8:	a943342c 	ldp	x12, x13, [x1, #48]
   600fc:	a97e0881 	ldp	x1, x2, [x4, #-32]
   60100:	a97f0c84 	ldp	x4, x3, [x4, #-16]
   60104:	a9001c06 	stp	x6, x7, [x0]
   60108:	a9012408 	stp	x8, x9, [x0, #16]
   6010c:	a9022c0a 	stp	x10, x11, [x0, #32]
   60110:	a903340c 	stp	x12, x13, [x0, #48]
   60114:	a93e08a1 	stp	x1, x2, [x5, #-32]
   60118:	a93f0ca4 	stp	x4, x3, [x5, #-16]
   6011c:	d65f03c0 	ret
   60120:	92400c09 	and	x9, x0, #0xf
   60124:	927cec03 	and	x3, x0, #0xfffffffffffffff0
   60128:	a940342c 	ldp	x12, x13, [x1]
   6012c:	cb090021 	sub	x1, x1, x9
   60130:	8b090042 	add	x2, x2, x9
   60134:	a9411c26 	ldp	x6, x7, [x1, #16]
   60138:	a900340c 	stp	x12, x13, [x0]
   6013c:	a9422428 	ldp	x8, x9, [x1, #32]
   60140:	a9432c2a 	ldp	x10, x11, [x1, #48]
   60144:	a9c4342c 	ldp	x12, x13, [x1, #64]!
   60148:	f1024042 	subs	x2, x2, #0x90
   6014c:	54000169 	b.ls	60178 &lt;memcpy+0x138&gt;  // b.plast
   60150:	a9011c66 	stp	x6, x7, [x3, #16]
   60154:	a9411c26 	ldp	x6, x7, [x1, #16]
   60158:	a9022468 	stp	x8, x9, [x3, #32]
   6015c:	a9422428 	ldp	x8, x9, [x1, #32]
   60160:	a9032c6a 	stp	x10, x11, [x3, #48]
   60164:	a9432c2a 	ldp	x10, x11, [x1, #48]
   60168:	a984346c 	stp	x12, x13, [x3, #64]!
   6016c:	a9c4342c 	ldp	x12, x13, [x1, #64]!
   60170:	f1010042 	subs	x2, x2, #0x40
   60174:	54fffee8 	b.hi	60150 &lt;memcpy+0x110&gt;  // b.pmore
   60178:	a97c0881 	ldp	x1, x2, [x4, #-64]
   6017c:	a9011c66 	stp	x6, x7, [x3, #16]
   60180:	a97d1c86 	ldp	x6, x7, [x4, #-48]
   60184:	a9022468 	stp	x8, x9, [x3, #32]
   60188:	a97e2488 	ldp	x8, x9, [x4, #-32]
   6018c:	a9032c6a 	stp	x10, x11, [x3, #48]
   60190:	a97f2c8a 	ldp	x10, x11, [x4, #-16]
   60194:	a904346c 	stp	x12, x13, [x3, #64]
   60198:	a93c08a1 	stp	x1, x2, [x5, #-64]
   6019c:	a93d1ca6 	stp	x6, x7, [x5, #-48]
   601a0:	a93e24a8 	stp	x8, x9, [x5, #-32]
   601a4:	a93f2caa 	stp	x10, x11, [x5, #-16]
   601a8:	d65f03c0 	ret
   601ac:	00000000 	udf	#0

When performing a memcpy on device type memory where size = 0x8 + 0x4n where n is any natural number, an exception will be thrown as even though care may be taken to have src/dst pointers aligned, the instruction seen on 6009c from the below objdump of memcpy on aarch64 leads to ldur x7, [x4, #-8]. Which in the case of a size 0xc copy would do an LDUR of a 32bit aligned address ending in 0x4 to a 64 bit x register, which results in a Data Abort on system type memory.

While I understand that care must be taken when using stdlib functions in a baremetal application, due to the nature of our codebase it would be very difficult to ensure that every call to memcpy has a size that is 64bit aligned. Shouldn't newlib/compiler take care to ensure that memcpy will use 32bit w registers for any 32bit aligned memcpy anyway? Especially with -mstrict-align?

What are my options as far as providing an immediate fix in the meantime, I suppose I could try to override the definition of memcpy but what source should I base the replacement implementation on in that case.

Any help on this is appreciated, thanks.

答案1

得分: 3

我明白了，你只需要翻译代码部分。以下是翻译好的代码：

实际上，我认为更大的“bug”在于你的期望。在设备内存上，你根本不能使用`memcpy`或任何其他库函数。

现代优化编译器和库的默认假设是它们在正常内存上操作，其访问没有任何副作用，并且没有被任何其他软件或硬件同时访问（*）。所以，不对齐的访问（gcc和newlib默认假设是可以的）是你最不用担心的问题。`memcpy`完全可以用任何组合的加载或存储来完成其工作。包括：

- 三次4字节访问
- 一个8字节和一个4字节访问
- 十二次一字节访问
- 两个重叠的八字节访问
- 一个超出源缓冲区边界的16字节加载，如果它能证明不会越过页面边界
- 对同一地址的多次加载
- 对同一地址的多次存储，除了最后一个之外，任何一个都可能是错误的值

使用`-mstrict-align`实际上并没有什么帮助。首先，正如你已经注意到的，它只影响你实际编译的代码；它对已经构建的库代码毫无作用。你将不得不使用此选项重新构建所有的newlib，并单独审查newlib中的所有汇编代码。但它对上述任何其他问题都没有帮助，这些问题对设备内存来说都可能是灾难性的。（正如amonakov所指出的，由于很少使用`-mstrict-align`，它可能容易受到编译器错误的影响。）

在设备内存中，你需要精确控制进行多少次加载和存储，以及是哪些地址，以哪些大小和以哪种顺序进行的。C/C++中只有一个机制可以做到这一点，那就是`volatile`。因此，对设备内存的所有访问都需要通过`volatile`指针显式完成，或者使用汇编语言。

如果你需要进行32位访问，我认为编写你的示例代码的唯一安全方式是：
```c
volatile uint32_t *dest = (volatile uint32_t *)0x500;
volatile uint32_t *src = (volatile uint32_t *)0x1000;
for (int i = 0; i < 3; i++)
    dest[i] = src[i];

如果你对所有设备内存都这样做，那么你可以安全地在正常内存上使用编译代码和库函数，而无需使用-mstrict-align。（前提是你在页表中正确地标记了所有正常内存，并且SCTLR_ELx.A位被清除。）

() C/C++数据竞争规则允许多个读取者*同时访问相同的内存。因此，你可以假设你没有显式写入的内存将不会被写入。除此之外，编译器几乎完全可以自由地以任何方式发明/丢弃/组合/重新排序加载和存储。```

英文:

Actually, I think the larger "bug" is in your expectations. You simply can't use memcpy or any other library function on device memory.

The default assumption of modern optimizing compilers and libraries is that they are operating on normal memory, whose access has no side effects and which is not being concurrently accessed by any other software or hardware (*). So unaligned access (which gcc and newlib assume by default is okay) is the least of your worries. It is totally fair game for memcpy to do its work with any combination of loads or stores whatsoever. Including:

Three 4-byte accesses
An 8-byte and a 4-byte access
Twelve one-byte accesses
Two overlapping eight-byte accesses
A 16-byte load beyond the bounds of the source buffer, if it can prove that it will not cross a page boundary
Multiple loads of the same address
Multiple stores to the same address, of which any but the last could be the wrong values

Using -mstrict-align doesn't really help. First, as you already noticed, it only affects the code which you actually compile with it; it does nothing about library code that was already built. You would have to rebuild all of newlib with this option, and then audit all the assembly code in newlib separately. But it doesn't help with any of the other issues above, all of which are potentially disastrous for device memory. (And as amonakov noted, since -mstrict-align is rarely used, it can be prone to compiler bugs.)

With device memory, you need exact control over how many loads and stores are done, to which addresses, of which sizes, and in which order. There is only one mechanism in C/C++ to get that, namely volatile. So all accesses to device memory need to be done explicitly through volatile pointers, or using assembly.

If you need 32-bit accesses done, I think the only safe way to write your example code is:

volatile uint32_t *dest = (volatile uint32_t *)0x500;
volatile uint32_t *src = (volatile uint32_t *)0x1000;
for (int i = 0; i &lt; 3; i++)
    dest[i] = src[i];

And if you do this for all device memory, then you can safely use compiled code and library functions on your normal memory, without needing -mstrict-align either. (Provided that you properly marked all normal memory as such in the page tables, and that the SCTLR_ELx.A bit is cleared.)

(*) The C/C++ data race rules do allow multiple readers to concurrently access the same memory. So you can assume that memory which you do not explicitly write, will not be written at all. Beyond that, the compiler has nearly complete liberty to invent / discard / combine / reorder loads and stores in any fashion.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

memcpy在AARCH64上产生未对齐的数据异常，是ARM GNU工具链或newlibc的错误吗？

问题

答案1

Rust环形箱无法从Apple M1跨编译到x86_64-unknown-linux-gnu。

gcc在数组初始化器中有额外逗号时不会出错。

libc和libgcc中的PIC

如何匹配适用于不同数据大小的CRC32算法

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论