英文:
Get L3 cache associativity using cpuid
问题
我对汇编非常陌生,我没有完全理解出了什么问题。我需要编写has_cpuid()
、has_l3_cache()
和get_l3_cache_associativity()
函数,这些函数可以在Intel和AMD上使用__asm
正常工作。has_cpuid
运行良好,但其他函数也不起作用。问题是什么?
has_l3_cache()
输出false,尽管我有L3缓存。get_l3_cache_associativity()
也给出了不正确的关联性。
英文:
I'm very new to assembly and I didn't fully understand what was wrong. I need to write has_cpuid(), has_l3_chache(), get_l3_cache associativity() functions that will work on both Intel and AMD using __asm. has_cpuid works fine, but the others don't work either. What is the problem?
/* Function to check CPUID support */
bool has_cpuid() {
int id_bit;
__asm {
pushfd // Push EFLAGS on the stack
pop eax // Load EFLAGS flags into EAX
mov ecx, eax // Save a copy
xor eax, 200000h // Toggle the ID bit in EAX
push eax // Push new EFLAGS on the stack
popfd // Restore EFLAGS from the stack
pushfd // Save EFLAGS back on the stack
pop eax // Load EFLAGS flags into EAX again
xor eax, ecx // If the ID bit has changed, then the CPUID is supported
shr eax, 21 // Move result to least significant bit
mov id_bit,eax
}
return id_bit;
}
/* Function to check for L3 cache */
bool has_l3_cache() {
int cacheLevel;
__asm {
mov eax, 0x80000006
cpuid
mov cacheLevel, ecx
}
return cacheLevel & (1 << 16);
}
/* Function to get L3 cache associativity */
int get_l3_cache_associativity() {
int level = 3; // L3 cache
int result;
__asm {
mov eax, 4 // CPUID function for cache information
mov ecx, level // Cache level to check
cpuid // Call CPUID
mov eax, ebx // Associativity data is in EBX[31:22]
shr eax, 22 // Move Associativity Data To LSB
inc eax // CPUID returns n-1, so increase by 1
mov result, eax
}
return result;
}
int main() {
if (!has_cpuid()) {
printf("CPUID is not supported.\n");
return 1;
}
if (has_l3_cache()) {
printf("L3 Cache is present.\n");
printf("L3 Cache associativity: %d\n", get_l3_cache_associativity());
}
else {
printf("L3 Cache is not present.\n");
}
return 0;
}
has_l3_cache()
outputs false despite I have L3 cache. get_l3_cache_associativity()
also gives incorrect associativity.
答案1
得分: 1
一般来说:
-
极老的CPU不支持CPUID。
-
很老的CPU没有L3缓存。
-
早期的Intel CPU使用“CPUID leaf 0x00000002”,提供一堆8位标识符,您必须通过查找表进行解码(例如,字节0xD2表示“统一L3缓存,2048 KiB,4路关联,64字节缓存行”),其中一些模糊不清并且取决于CPU型号(例如,字节0x49表示如果是Pentium 4则为“4 MiB L3缓存”,如果是Core 2则为“4 MiB L2缓存”)。
-
较新的Intel CPU使用“CPUID leaf 0x00000004”,其中ECX中的输入值选择要返回信息的条目。注意:“ECX = 3”不一定意味着“获取L3缓存的信息”,只意味着“获取任何编号为3的条目,可以是任何类型的缓存”,您必须检查EAX中的返回值的0到4位以确定信息是关于什么类型的缓存的;这意味着要获取特定类型的缓存,您需要循环尝试ECX的每个支持的值,直到找到所需的内容。 这还提供了有关共享每个缓存的CPU数量的信息。如果您还关心额外的缓存特性(服务质量,服务类别),您可能还需要来自“CPUID leaf 0x0000001B”(如果可用)的信息;如果您还想要TLB信息,您可能还需要关心“CPUID leaf 0x0000001B”(如果可用)。此外,有可能(对于Intel CPU)通过一个MSR来禁用此CPUID leaf(以及所有更高的CPUID leaf)以解决Windows NT中的一个古老错误,因此检查并纠正这一点可能是一个不错的主意(在较新的CPU中,但不是在不支持的古老的CPU中)。
-
旧版的AMD CPU使用“CPUID leaf 0x80000005”(用于L1缓存和TLB),也可能使用“CPUID leaf 0x80000006”(用于L2和L3缓存和TLB)。
-
较新的AMD CPU使用“CPUID leaf 0x0000001D”,在很大程度上类似于Intel的“CPUID leaf 0x00000004”。
-
其他供应商(Cyrix,VIA)主要使用了Intel的旧方法(并且大多没有L3缓存)。
-
对于上述所有情况,您应该担心CPU的缺陷和怪癖。仅因为CPUID显示某些内容并不意味着CPUID是正确的。这主要以使用CPU的签名(“vendor:family:model:stepping”)来确定是否存在任何更正的形式,然后(如果必要)在收集到错误信息后调用CPU特定的代码来更正信息。还有可能情况,即CPU支持某些内容,但CPUID没有,这种情况下,相同的缺陷/怪癖处理代码也可以填补任何缺失的信息。
总的来说,如果您想要可靠地检测缓存信息的代码(适用于所有80x86 CPU),那么您可以期望花费几个月的时间编写代码,检查缺陷列表并尝试测试不同的CPU的不同代码。
对于您的特定代码;您使用“仅适用于较新的AMD”方法来确定是否存在L3信息(您的has_l3_cache()
),然后使用“仅适用于较新的Intel”方法来获取可能不是L3缓存的信息;因此,您不能指望它起作用,因为CPU不可能同时是AMD CPU和Intel CPU。还有其他问题(例如,假设CPUID leaves存在仅因为CPUID指令存在,而不检查“CPUID leaf 0x00000000”中的“最大支持级别”或检查“CPUID leaf 0x80000000”中的“最大支持扩展级别”)。
英文:
In general:
-
extremely old CPUs didn't support CPUID
-
very old CPUs didn't have L3 caches
-
older Intel CPUs used "CPUID leaf 0x00000002", which provides a bunch of 8-bit identifiers that you have to decode via. a lookup table (e.g. the byte 0xD2 means "Unified L3 cache, 2048 KiB, 4 way associativity, 64 byte cache lines"), and where some are ambiguous and depend on the CPU model (e.g. the byte 0x49 means "4 MiB L3 cache" if it's a Pentium 4 or "4 MiB L2 cache" if it's a Core 2).
-
newer Intel CPUs use "CPUID leaf 0x00000004", where the input value in ECX selects an entry to return information for. Note: "ECX = 3" does not necessarily mean "get information for L3 cache" and only means "get whatever entry #3 is which could be any kind of cache", and you have to check the returned value in EAX bits 0 to 4 to determine what kind of cache the information is for; which means to get a specific type of cache you need a loop to try each supported value of ECX until you find what you wanted. This also gives information about the number of CPUs sharing each cache. If you also care about extra cache characteristics (quality of service, class of service) you might also want info from "CPUID leaf 0x0000001B" (if available); and if you also want TLB info you might also need to care about "CPUID leaf 0x0000001B" (if available). Also, it's possible (for Intel CPUs) that this CPUID leaf (and all higher CPUID leaves) are disabled by an MSR to work around an ancient bug in Windows NT; so it'd be a potentially good idea to check for and correct for that (in Intel's "MISC_ENABLE" MSR, in newer CPUs but not ancient CPUs that won't support it).
-
older AMD CPUs used "CPUID leaf 0x80000005" (for L1 caches and TLBs) and maybe also "CPUID leaf 0x80000006" (for L2 and L3 caches and TLBs).
-
newer AMD CPUs use "CPUID leaf 0x0000001D", and is mostly similar to Intel's "CPUID leaf 0x00000004".
-
other vendors (Cyrix, VIA) mostly used Intel's old methods (and mostly didn't have L3 caches).
-
for all of the above you should worry about CPU errata and quirks. Just because CPUID says something doesn't mean that CPUID is correct. This mostly takes the form of code that uses the CPU's signature ("vendor:family:model:stepping") to determine if there's any corrections and then (if necessary) call CPU specific code to correct information after potentially wrong information was gathered. There are also potentially cases where a CPU supports something but CPUID doesn't, where the same errata/quirk handling code can also fill in any missing information.
Mostly; if you want code that detects cache information reliably (on all 80x86 CPUs) then you can expect to spend months writing code, checking errata lists and trying to test different code for different CPUs.
For your specific code; you're using a "newer AMD only" method to determine if L3 information exists (your has_l3_cache()
) and then using a "newer Intel only" method to obtain information for something that might not be an L3 cache; so you can't expect it to work because its impossible for a CPU to be an AMD CPU and an Intel CPU at the same time. There are also other problems (e.g. assuming that CPUID leaves exist just because the CPUID instruction exists, without checking "max. supported level" from "CPUID leaf 0x00000000" or checking "max. supported extended level" from "CPUID leaf 0x80000000").
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论