2023年3月8日 18:29:23go评论120阅读模式

英文:

How to write an operand that is a 512-bit vector loaded from a N-bit memory location in x86 Assembly

问题

The source Intel manual is here: https://cdrdv2.intel.com/v1/dl/getContent/671110

The registers are specified as m32bcst or m64bcst

Example of an instruction that has a variant that uses this operand

I am interested in writing the variant of the instruction that uses this operand in actual Assembly.

If instead of operand m32bcst we had a variant with operand m32, using MASM Assembly for instance one could write: VMINPS YMM1{k1}{z}, YMM2, DWORD PTR[EAX]

I am not sure what to do in case of an m32bcst operand however.

英文:

The source Intel manual is here: https://cdrdv2.intel.com/v1/dl/getContent/671110

The registers are specified as m32bcst or m64bcst

Example of an instruction that has a variant that uses this operand

I am interested in writing the variant of the instruction that uses this operand in actual Assembly.

If instead of operand m32bcst we had a variant with operand m32, using MASM Assembly for instance one could write: VMINPS YMM1{k1}{z}, YMM2, DWORD PTR[EAX]

I am not sure what to do in case of an m32bcst operand however.

答案1

得分: 3

It varies by assembler. Some support the {1to16} / {1to8} / {1to4} syntax in slides from a 2014 talk introducing AVX-512 at a GCC conference, by Kirill Yukhin of Intel. (Despite it being a GCC talk, the slides use Intel syntax.) Others support that and/or something else.

MASM: vminps zmm1, zmm2, DWORD bcst [rax]
NASM vminps zmm1, zmm2, [rax] {1to16} (optional dword or qword specifier in the usual place, like dword [rax]{1to16} NASM does not support the bcst keyword.
€ASM aka Euro Assembler: vminps ymm1,ymm2,[rax],Bcst=on
GAS/clang .intel_syntax is MASM-like in general and supports dword bcst [rax]. But also [rax]{1to16}. (objdump -drwC -Mintel uses dword bcst [rax])
AT&T syntax: vminps (%rax){1to16},%zmm2,%zmm1

The machine code only has 1 bit to encode broadcast vs. regular, so there's no way to broadcast 64-bit pairs of floats for vminps; the broadcast element size has to match the SIMD element size. So €ASM's minimal syntax is sufficient; the others merely provide a way for the assembler to check for a mismatch in what the human thinks the instruction will do.

Unlike embedded rounding + suppress-all-exceptions which only work with scalar (like vmulss) or 512-bit instructions<sup>1</sup>, broadcast memory operands do work with 256 and 128-bit bit vectors as well (AVX512VL).

Broadcast element sizes of 32 and 64-bit are supported; not coincidentally, those are the element sizes that load ports on Intel CPUs can do for free as part of a load uop. (Note that vpbroadcastb/w vec, [mem] need an ALU uop, vpbroadcastd/q only need the load uop.)

Footnote 1: e.g. vmulps zmm0,zmm1,zmm2{rz-sae} (GAS .intel_syntax / MASM)
or vmulps zmm0, zmm1, zmm2, {rz-sae} (NASM, with an extra comma before the {})

英文:

MASM: vminps zmm1, zmm2, DWORD bcst [rax]
NASM vminps zmm1, zmm2, [rax] {1to16} (optional dword or qword specifier in the usual place, like dword [rax]{1to16} NASM does not support the bcst keyword.
€ASM aka Euro Assembler: vminps ymm1,ymm2,[rax],Bcst=on
GAS/clang .intel_syntax is MASM-like in general and supports dword bcst [rax]. But also [rax]{1to16}. (objdump -drwC -Mintel uses dword bcst [rax])
AT&T syntax: vminps (%rax){1to16},%zmm2,%zmm1

Footnote 1: e.g. vmulps zmm0,zmm1,zmm2{rz-sae} (GAS .intel_syntax / MASM)
or vmulps zmm0, zmm1, zmm2, {rz-sae} (NASM, with an extra comma before the {})

答案2

得分: 1

指定的指令可以写成
VMINPS ZMM1, ZMM2, DWORD bcst [EAX]

示例可以在这里看到。

英文:

The specified instruction can be written as
VMINPS ZMM1, ZMM2, DWORD bcst [EAX]

An example can be seen here

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在x86汇编中编写一个操作数，它是从一个N位内存位置加载的512位向量。

问题

答案1

答案2

x86 LEA指令执行模糊操作。

堆栈和数组在汇编8086中

如何查看 Ziglang 程序的汇编输出？

关于Golang汇编的一些困惑

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论