2023年5月17日 22:52:51go评论64阅读模式

英文:

Why do MWAIT Power Management hints cause premature wakeups?

问题

Here is the translated portion of your text:

"我目前正在大学中进行MONITOR/MWAIT指令对的实验。具体来说，我想要测量CPU在不同情景下的能耗，并已经编写了一个相对有效的测试设置。作为设置的一部分，我让所有核心进入MWAIT状态，然后在指定时间后使用NMI将它们唤醒。到目前为止一切正常，但现在我想测试电源管理提示如何影响功耗。

不幸的是，除了0以外的任何提示似乎都会导致MWAIT不等待NMI，而在3-4毫秒后自行唤醒。就我理解的文档来说，电源管理提示不应该影响MWAIT之后执行继续的时间，所以这相当奇怪。由于即使花费了几个小时解决这个问题后仍然没有取得任何进展，我想也许这里有人有一些关于发生了什么的想法！

这是我在代码中如何使用MONITOR/MWAIT的部分：

volatile int dummy;

void do_mwait() {
    asm volatile("monitor;" ::"a"(&dummy), "c"(0), "d"(0));
    asm volatile("mwait;" ::"a"(0x10), "c"(0));
}

显然，这只是我编写的Linux内核模块的一个小节选，但应该包含所有重要的内容。dummy是一个只在这里使用的变量。它只存在以便我有一个有效的地址传递给monitor。do_mwait()是在我进行测量时在每个可用的核心上执行的函数。如我所说，将do_mwait()第二行中的0x10更换为0可以让它按照我期望的方式工作。

由于MONITOR/MWAIT的行为和支持的特性取决于具体的CPU型号，这里是我测试机上所有相关（我认为的）cpuid部分。据我看来，应该支持所有必要的特性："

Please note that I have excluded the parts you requested not to translate and have provided the translated content only. If you need further assistance or have any specific questions related to this, please feel free to ask.

英文:

For university I'm currently experimenting with the MONITOR/MWAIT instruction pair.
Specifically, I want to measure how much energy the CPU uses in different scenarios and have already programmed a relatively well working test setup.
As part of the setup, I have all the cores enter MWAIT and then use a NMI to wake them up again after a specified time.
So far everything was working fine, but now I wanted to test how the Power Management hints affect the power consumption.

Unfortunately, every hint apart from 0 seems to cause MWAIT to not wait for the NMI, but to wake up on its' own after 3-4 ms.
As far as I understand the documentation, the Power Management hints should not have any impact on when execution is continued after the MWAIT, so this is quite strange.
And since I still haven't made any progress even after spending a few hours on this problem, I thought maybe someone here has some idea what is going on!

Here is how I use MONITOR/MWAIT in my code:

volatile int dummy;

void do_mwait() {
    asm volatile(&quot;monitor;&quot; ::&quot;a&quot;(&amp;dummy), &quot;c&quot;(0), &quot;d&quot;(0));
    asm volatile(&quot;mwait;&quot; ::&quot;a&quot;(0x10), &quot;c&quot;(0));
}

This is obviously just a small excerpt of the Linux kernel module I've written, but should contain all the important points.
dummy is a variable that is never used outside of what you can see here. It only exists so that I have a valid address to pass to monitor.
do_mwait() is the function that gets executed on every core available while I do my measurements.
As I said, just exchanging the 0x10 in the second line of do_mwait() with 0 makes it work the way I expect.

Because the behaviour and supported features of MONITOR/MWAIT depend on the specific CPU model, here are all the relevant (I think) parts of cpuid on my test machine. As far as I see, all necessary features should be supported:

CPU 0:
   vendor_id = &quot;GenuineIntel&quot;
   version information (1/eax):
      processor type  = primary processor (0)
      family          = 0x6 (6)
      model           = 0xc (12)
      stepping id     = 0x3 (3)
      extended family = 0x0 (0)
      extended model  = 0x3 (3)
      (family synth)  = 0x6 (6)
      (model synth)   = 0x3c (60)
      (simple synth)  = Intel Core (unknown type) (Haswell C0) {Haswell}, 22nm
   ...
   feature information (1/ecx):
      ...
      MONITOR/MWAIT                           = true
      ...
   ...
   MONITOR/MWAIT (5):
      smallest monitor-line size (bytes)       = 0x40 (64)
      largest monitor-line size (bytes)        = 0x40 (64)
      enum of Monitor-MWAIT exts supported     = true
      supports intrs as break-event for MWAIT  = true
      number of C0 sub C-states using MWAIT    = 0x0 (0)
      number of C1 sub C-states using MWAIT    = 0x2 (2)
      number of C2 sub C-states using MWAIT    = 0x1 (1)
      number of C3 sub C-states using MWAIT    = 0x2 (2)
      number of C4 sub C-states using MWAIT    = 0x4 (4)
      number of C5 sub C-states using MWAIT    = 0x0 (0)
      number of C6 sub C-states using MWAIT    = 0x0 (0)
      number of C7 sub C-states using MWAIT    = 0x0 (0)
   ...
   brand = &quot;Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz&quot;
   ...

I hope this is enough context. Please tell me if I need to share additional information.
And thanks in advance for any input, even if it's just an (educated) guess!

答案1

得分: 2

以下是您要翻译的内容：

来自英特尔手册（https://www.felixcloutier.com/x86/mwait）

> 下列情况会导致处理器退出实现相关优化状态：对由MONITOR指令设置的地址范围进行存储、非屏蔽中断（NMI）或系统管理中断（SMI）、调试异常、机器检查异常、BINIT＃信号、INIT＃信号和RESET＃信号。其他实现相关事件也可能导致处理器退出实现相关优化状态。
>
> ...
>
> 实现特定条件可能导致中断引起处理器退出实现相关优化状态，即使中断被屏蔽，且ECX[0] = 0。

这可能是其中之一的例子。

显然，这并不令人满意，如果我知道从何处查找可能仅在深度睡眠状态下发生的微体系结构解释将会很好。

也许在C1状态（EAX=0）中，仍然有足够的电源供电以检查中断屏蔽，但在更深的睡眠状态下，甚至一些内部中断控制器的部分也被关闭了？

希望他们可以在不启动整个核心并恢复执行的情况下进行检查，但也许这是一个微码更新启用的解决某些设计问题的临时解决方法。英特尔的Haswell勘误列表（https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf）确实提到了与C状态相关的问题，其中“BIOS可以包含解决方案”，这实际上意味着BIOS可以包含更改CPU行为的微码更新。英特尔通常提供微码的方法，以便在发现问题的情况下禁用某些优化或功能，也许在某些角落案例中修复某种死机的唯一方法是总是在中断时从C2或更深的状态唤醒核心，即使它们应该被屏蔽。

这纯粹是对原因的猜测，但英特尔明确记录了您所看到的可能性。

其他可能性包括SMI（系统管理模式中断），但希望您的系统不会定期触发这些。

另请参阅：

https://stackoverflow.com/questions/50790715/is-there-a-way-to-determine-that-smm-interrupt-has-occured（turbostat在其输出中有一个SMI计数的列）。您还可以检查/proc/interrupts，看看是否在该核心上运行其他中断处理程序。
https://stackoverflow.com/questions/25399405/evaluating-smi-system-management-interrupt-latency-on-linux-centos-intel-machi
https://stackoverflow.com/questions/40583848/differences-among-various-interrupts-sci-smi-nmi-and-normal-interrupt

我还在评论中提到，使用超线程技术，这个逻辑核心可能会在其他逻辑核心必须唤醒时唤醒，因为物理核心已经开始上电，因此大部分唤醒成本已经支付。这可能会有两种情况：同时唤醒两个兄弟核心意味着它不能以单线程模式运行，以更快地为绝对需要唤醒的核心提供服务。

这纯粹是一种猜测，可以两种情况都可能发生。文档明确指出了一般情况下可能发生虚假唤醒。

（您的代码同时使物理核心的两个逻辑核心进入睡眠状态，这应该可以避免您的情况的问题。但如果其他人的情况只涉及单个逻辑核心，也许可以尝试禁用超线程。）

英文:

From Intel's manual (https://www.felixcloutier.com/x86/mwait)

> The following cause the processor to exit the implementation-dependent-optimized state: a store to the address range armed by the MONITOR instruction, an NMI or SMI, a debug exception, a machine check exception, the BINIT# signal, the INIT# signal, and the RESET# signal. Other implementation-dependent events may also cause the processor to exit the implementation-dependent-optimized state.
>
> ...
>
> Implementation-specific conditions may result in an interrupt causing the processor to exit the implementation-dependent-optimized state even if interrupts are masked and ECX[0] = 0.

This might be an example of either of those.

Obviously that's not very satisfying, and it would be nice if I knew where to look to find any microarchitectural explanation of why that might only happen with deeper sleep states.

Like perhaps in C1 state (EAX=0), enough is still powered on to check the interrupt mask, but in deeper sleep states even some of the internal interrupt controller stuff is powered down?

You'd hope that they could check without powering up the whole core and resuming execution, but maybe that's something a microcode update enabled as a workaround for some design problem that was discovered later. Intel's Haswell errata list (https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf) does mention some C-state related problems where "the BIOS can contain a workaround", which actually means the BIOS can include a microcode update that changes CPU behaviour. Intel often includes ways for microcode to disable certain optimizations or features in case problems are found, and perhaps the only way to fix some lockup in an odd corner case was to always have the core wake from C2 or deeper on interrupts, even when they're supposed to be masked.

That's pure guesswork as to the cause, but Intel does clearly document that what you're seeing is possible.

Other possibilities include SMI (system-management-mode interrupts), but hopefully your system doesn't fire those regularly.

MWAIT电源管理提示为什么会导致提前唤醒？

问题

答案1

如何保留我触摸的登记？

如何在反汇编代码中扩展一个函数？

Enabling the VGA 13h video mode on a modern PC in UEFI via a UEFI bootloader, written in assembly

我的emu8086矩阵乘法代码为什么没有输出预期结果？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论