英文:
How to write an operand that is a 512-bit vector loaded from a N-bit memory location in x86 Assembly
问题
The source Intel manual is here: https://cdrdv2.intel.com/v1/dl/getContent/671110
The registers are specified as m32bcst or m64bcst
Example of an instruction that has a variant that uses this operand
I am interested in writing the variant of the instruction that uses this operand in actual Assembly.
If instead of operand m32bcst we had a variant with operand m32, using MASM Assembly for instance one could write: VMINPS YMM1{k1}{z}, YMM2, DWORD PTR[EAX]
I am not sure what to do in case of an m32bcst operand however.
英文:
The source Intel manual is here: https://cdrdv2.intel.com/v1/dl/getContent/671110
The registers are specified as m32bcst or m64bcst
Example of an instruction that has a variant that uses this operand
I am interested in writing the variant of the instruction that uses this operand in actual Assembly.
If instead of operand m32bcst we had a variant with operand m32, using MASM Assembly for instance one could write: VMINPS YMM1{k1}{z}, YMM2, DWORD PTR[EAX]
I am not sure what to do in case of an m32bcst operand however.
答案1
得分: 3
It varies by assembler. Some support the {1to16}
/ {1to8}
/ {1to4}
syntax in slides from a 2014 talk introducing AVX-512 at a GCC conference, by Kirill Yukhin of Intel. (Despite it being a GCC talk, the slides use Intel syntax.) Others support that and/or something else.
-
MASM:
vminps zmm1, zmm2, DWORD bcst [rax]
-
NASM
vminps zmm1, zmm2, [rax] {1to16}
(optionaldword
orqword
specifier in the usual place, likedword [rax]{1to16}
NASM does not support thebcst
keyword. -
€ASM aka Euro Assembler:
vminps ymm1,ymm2,[rax],Bcst=on
-
GAS/clang
.intel_syntax
is MASM-like in general and supportsdword bcst [rax]
. But also[rax]{1to16}
. (objdump -drwC -Mintel
usesdword bcst [rax]
) -
AT&T syntax:
vminps (%rax){1to16},%zmm2,%zmm1
The machine code only has 1 bit to encode broadcast vs. regular, so there's no way to broadcast 64-bit pairs of floats for vminps
; the broadcast element size has to match the SIMD element size. So €ASM's minimal syntax is sufficient; the others merely provide a way for the assembler to check for a mismatch in what the human thinks the instruction will do.
Unlike embedded rounding + suppress-all-exceptions which only work with scalar (like vmulss
) or 512-bit instructions<sup>1</sup>, broadcast memory operands do work with 256 and 128-bit bit vectors as well (AVX512VL).
Broadcast element sizes of 32 and 64-bit are supported; not coincidentally, those are the element sizes that load ports on Intel CPUs can do for free as part of a load uop. (Note that vpbroadcastb/w vec, [mem]
need an ALU uop, vpbroadcastd/q
only need the load uop.)
Footnote 1: e.g. vmulps zmm0,zmm1,zmm2{rz-sae}
(GAS .intel_syntax / MASM)
or vmulps zmm0, zmm1, zmm2, {rz-sae}
(NASM, with an extra comma before the {})
英文:
It varies by assembler. Some support the {1to16}
/ {1to8}
/ {1to4}
syntax in slides from a 2014 talk introducing AVX-512 at a GCC conference, by Kirill Yukhin of Intel. (Despite it being a GCC talk, the slides use Intel syntax.) Others support that and/or something else.
-
MASM:
vminps zmm1, zmm2, DWORD bcst [rax]
-
NASM
vminps zmm1, zmm2, [rax] {1to16}
(optionaldword
orqword
specifier in the usual place, likedword [rax]{1to16}
NASM does not support thebcst
keyword. -
€ASM aka Euro Assembler:
vminps ymm1,ymm2,[rax],Bcst=on
-
GAS/clang
.intel_syntax
is MASM-like in general and supportsdword bcst [rax]
. But also[rax]{1to16}
. (objdump -drwC -Mintel
usesdword bcst [rax]
) -
AT&T syntax:
vminps (%rax){1to16},%zmm2,%zmm1
The machine code only has 1 bit to encode broadcast vs. regular, so there's no way to broadcast 64-bit pairs of floats for vminps
; the broadcast element size has to match the SIMD element size. So €ASM's minimal syntax is sufficient; the others merely provide a way for the assembler to check for a mismatch in what the human thinks the instruction will do.
Unlike embedded rounding + suppress-all-exceptions which only work with scalar (like vmulss
) or 512-bit instructions<sup>1</sup>, broadcast memory operands do work with 256 and 128-bit bit vectors as well (AVX512VL).
Broadcast element sizes of 32 and 64-bit are supported; not coincidentally, those are the element sizes that load ports on Intel CPUs can do for free as part of a load uop. (Note that vpbroadcastb/w vec, [mem]
need an ALU uop, vpbroadcastd/q
only need the load uop.)
Footnote 1: e.g. vmulps zmm0,zmm1,zmm2{rz-sae}
(GAS .intel_syntax / MASM)
or vmulps zmm0, zmm1, zmm2, {rz-sae}
(NASM, with an extra comma before the {})
答案2
得分: 1
指定的指令可以写成
VMINPS ZMM1, ZMM2, DWORD bcst [EAX]
示例可以在这里看到。
英文:
The specified instruction can be written as
VMINPS ZMM1, ZMM2, DWORD bcst [EAX]
An example can be seen here
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论