2023年6月1日 04:07:46go评论77阅读模式

英文:

Why does switch to long mode cause VGA text output to behave strangely?

问题

I'm developing a small demo that boots a x86_64 machine. During early init (real mode), I set videomode 3 via int 10h. I then write to memory-mapped text at 0xb8000. My second stage already is high-level C code. This worked perfectly in protected mode, 32 bit, with paging.

I changed the bootloader to also enable PAE and then set LME, then jump to the second stage (which then has been compiled as x86_64 already). This is where my display fell apart and I have no idea what is going on. I've been debugging small samples and have something that works reliably even in 64 bit mode:

	for (uint32_t i = 0xb8000; i &lt; 0xb8000 + (25 * 80 * 2); i += 2) {
		*((volatile uint16_t*)i) = 0x0741;
	}

As expected, this fills the screen with all "A"s. Here's the generated assembly:

000000000000843f &lt;main&gt;:
    843f:	f3 0f 1e fa          	endbr64
    8443:	55                   	push   %rbp
    8444:	48 89 e5             	mov    %rsp,%rbp
    8447:	c7 45 fc 00 80 0b 00 	movl   $0xb8000,-0x4(%rbp)
    844e:	eb 0c                	jmp    845c &lt;main+0x1d&gt;
    8450:	8b 45 fc             	mov    -0x4(%rbp),%eax
    8453:	66 c7 00 41 07       	movw   $0x741,(%rax)
    8458:	83 45 fc 02          	addl   $0x2,-0x4(%rbp)
    845c:	81 7d fc 9f 8f 0b 00 	cmpl   $0xb8f9f,-0x4(%rbp)
    8463:	76 eb                	jbe    8450 &lt;main+0x11&gt;
    8465:	90                   	nop
    8466:	eb fd                	jmp    8465 &lt;main+0x26&gt;

However, when I change my code to this:

	volatile uint16_t *screen_base = (volatile uint16_t*)0xb8000;
	for (uint32_t i = 0; i &lt; 25 * 80; i++) {
		screen_base[i] = 0x0741;
	}

It stops working; it outputs pink control characters (indicating the "0x07" is the character and the "0x41" is the color code), but does not even fill the whole screen (last two characters at the lower right end are not filled). Here's the generated assembly:

000000000000843f &lt;main&gt;:
    843f:	f3 0f 1e fa          	endbr64
    8443:	55                   	push   %rbp
    8444:	48 89 e5             	mov    %rsp,%rbp
    8447:	48 c7 45 f0 00 80 0b 	movq   $0xb8000,-0x10(%rbp)
    844e:	00 
    844f:	c7 45 fc 00 00 00 00 	movl   $0x0,-0x4(%rbp)
    8456:	eb 17                	jmp    846f &lt;main+0x30&gt;
    8458:	8b 45 fc             	mov    -0x4(%rbp),%eax
    845b:	48 8d 14 00          	lea    (%rax,%rax,1),%rdx
    845f:	48 8b 45 f0          	mov    -0x10(%rbp),%rax
    8463:	48 01 d0             	add    %rdx,%rax
    8466:	66 c7 00 41 07       	movw   $0x741,(%rax)
    846b:	83 45 fc 01          	addl   $0x1,-0x4(%rbp)
    846f:	81 7d fc cf 07 00 00 	cmpl   $0x7cf,-0x4(%rbp)
    8476:	76 e0                	jbe    8458 &lt;main+0x19&gt;
    8478:	90                   	nop
    8479:	eb fd                	jmp    8478 &lt;main+0x39&gt;

Weirdly enough I can mask the issue by just botching the pointer to 0xb8003, but this is obviously incorrect. I cannot figure out what is going on here, does anyone have an idea what could be happening?

英文:

	for (uint32_t i = 0xb8000; i &lt; 0xb8000 + (25 * 80 * 2); i += 2) {
		*((volatile uint16_t*)i) = 0x0741;
	}

As expected, this fills the screen with all "A"s. Here's the generated assembly:

000000000000843f &lt;main&gt;:
    843f:	f3 0f 1e fa          	endbr64
    8443:	55                   	push   %rbp
    8444:	48 89 e5             	mov    %rsp,%rbp
    8447:	c7 45 fc 00 80 0b 00 	movl   $0xb8000,-0x4(%rbp)
    844e:	eb 0c                	jmp    845c &lt;main+0x1d&gt;
    8450:	8b 45 fc             	mov    -0x4(%rbp),%eax
    8453:	66 c7 00 41 07       	movw   $0x741,(%rax)
    8458:	83 45 fc 02          	addl   $0x2,-0x4(%rbp)
    845c:	81 7d fc 9f 8f 0b 00 	cmpl   $0xb8f9f,-0x4(%rbp)
    8463:	76 eb                	jbe    8450 &lt;main+0x11&gt;
    8465:	90                   	nop
    8466:	eb fd                	jmp    8465 &lt;main+0x26&gt;

However, when I change my code to this:

	volatile uint16_t *screen_base = (volatile uint16_t*)0xb8000;
	for (uint32_t i = 0; i &lt; 25 * 80; i++) {
		screen_base[i] = 0x0741;
	}

000000000000843f &lt;main&gt;:
    843f:	f3 0f 1e fa          	endbr64
    8443:	55                   	push   %rbp
    8444:	48 89 e5             	mov    %rsp,%rbp
    8447:	48 c7 45 f0 00 80 0b 	movq   $0xb8000,-0x10(%rbp)
    844e:	00 
    844f:	c7 45 fc 00 00 00 00 	movl   $0x0,-0x4(%rbp)
    8456:	eb 17                	jmp    846f &lt;main+0x30&gt;
    8458:	8b 45 fc             	mov    -0x4(%rbp),%eax
    845b:	48 8d 14 00          	lea    (%rax,%rax,1),%rdx
    845f:	48 8b 45 f0          	mov    -0x10(%rbp),%rax
    8463:	48 01 d0             	add    %rdx,%rax
    8466:	66 c7 00 41 07       	movw   $0x741,(%rax)
    846b:	83 45 fc 01          	addl   $0x1,-0x4(%rbp)
    846f:	81 7d fc cf 07 00 00 	cmpl   $0x7cf,-0x4(%rbp)
    8476:	76 e0                	jbe    8458 &lt;main+0x19&gt;
    8478:	90                   	nop
    8479:	eb fd                	jmp    8478 &lt;main+0x39&gt;

答案1

得分: 3

汇编代码在调试构建中有点难以理解，但看起来正常。您是否100%确定在执行jmp far或其他导致CS描述符的L位被设置后，处于64位模式？因为有强烈的证据表明您可能没有。

在32位模式下，0x48是dec eax的操作码（而不是REX.W前缀），这似乎可以解释偏移量为-3字节的情况。add %rdx, %rax变成了dec %eax；add %edx, %eax。而在LEA之前有一个dec %eax，然后是将其加倍的操作，以及在mov -0x10(%ebp), %eax存储之前。

您的可行版本通过将uint32_t强制转换为指针来避免了任何REX前缀。请注意，没有指令使用64位操作大小、R8-R15或BPL-DIL，因此它们的机器码中没有以40到4F字节开头的部分。（除了最初的mov %rsp, %rbp之外，但在那一点上EAX并没有存活；对EAX的下一次访问是只写的。）

因此，这相当强烈地表明CPU并没有处于完全的64位模式。使用Bochs单步执行您的切换到长模式的操作，并检查您实际处于的模式。然后单步执行不工作的代码中的指令；您将看到48字节被解码为单独的指令。（您也可以在QEMU + GDB中执行此操作；GDB可能不确定CPU处于哪种模式，但通过陷阱标志TF进行单步执行将反映CPU实际执行的操作。）

顺便说一下，GCC调试构建更喜欢首先使用EAX/RAX来评估任何表达式，可能是因为它是返回值寄存器。如果GCC恰好选择了不同的寄存器，递减EAX就不会有影响。但在某些情况下，您肯定会遇到问题，例如当GCC使用RAX因为它是返回值寄存器，或者当它尝试使用DIL字节寄存器时，带有0x40 REX前缀（在32位模式下为inc eax）时。

英文:

The asm is fairly painful to follow in a debug build but it looks normal. Are you 100% sure you are in 64-bit mode after a jmp far or whatever to a CS descriptor with the L bit set? Because there's strong evidence you aren't.

0x48 in 32-bit mode is the opcode for dec eax (instead of a REX.W prefix), which looks like it might explain an offset of -3 bytes. add %rdx, %rax becomes dec %eax ; add %edx, %eax. And earlier there's a dec %eax before the LEA that doubles it, and before the mov -0x10(%ebp),%eax store.

Your version that works avoids any REX prefixes by casting uint32_t to a pointer. Note that none of the instructions use 64-bit operand-size, R8-R15, or BPL-DIL, so none of them start with a 40 to 4F byte in machine code. (Except the initial mov %rsp, %rbp, but EAX isn't live at that point; the next access to EAX is write-only.)

So that's pretty strong evidence the CPU's not in full 64-bit mode. Use Bochs to single-step your switch to long mode and check what mode you're actually in. And single-step by instructions in the not-working code; you'll see the 48 bytes decode as separate instructions. (You can do that in QEMU + GDB as well; GDB might not be sure what mode the CPU is in, but single-stepping via the trap flag TF will reflect what the CPU is actually doing.)

BTW, GCC debug builds prefer using EAX/RAX first for evaluating any expression, perhaps because it's the return value register. If GCC had happened to pick different registers, decrementing EAX wouldn't have mattered. But you'd definitely run into problems at some point, e.g. when GCC used RAX because it's the return value register, or when it tried to use the DIL byte register with a 0x40 REX prefix (inc eax in 32-bit mode.)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

切换到长模式为何导致VGA文本输出表现异常？

问题

答案1

将值加载到寄存器中，然后将右移后的值放入其中的目的是什么？

为什么这段NES 6502汇编代码在移动到一个有作用域的过程中不起作用？

数据字符串位于这个反汇编中的哪里？

Syscall.RawSyscall()和Syscall.Syscall()在Go中的详细信息是什么？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。