如何在x86汇编中编写一个操作数,它是从一个N位内存位置加载的512位向量。

huangapple go评论120阅读模式
英文:

How to write an operand that is a 512-bit vector loaded from a N-bit memory location in x86 Assembly

问题

The source Intel manual is here: https://cdrdv2.intel.com/v1/dl/getContent/671110

The registers are specified as m32bcst or m64bcst

Example of an instruction that has a variant that uses this operand

I am interested in writing the variant of the instruction that uses this operand in actual Assembly.

If instead of operand m32bcst we had a variant with operand m32, using MASM Assembly for instance one could write: VMINPS YMM1{k1}{z}, YMM2, DWORD PTR[EAX]

I am not sure what to do in case of an m32bcst operand however.

英文:

The source Intel manual is here: https://cdrdv2.intel.com/v1/dl/getContent/671110

The registers are specified as m32bcst or m64bcst

Example of an instruction that has a variant that uses this operand

I am interested in writing the variant of the instruction that uses this operand in actual Assembly.

If instead of operand m32bcst we had a variant with operand m32, using MASM Assembly for instance one could write: VMINPS YMM1{k1}{z}, YMM2, DWORD PTR[EAX]

I am not sure what to do in case of an m32bcst operand however.

答案1

得分: 3

It varies by assembler. Some support the {1to16} / {1to8} / {1to4} syntax in slides from a 2014 talk introducing AVX-512 at a GCC conference, by Kirill Yukhin of Intel. (Despite it being a GCC talk, the slides use Intel syntax.) Others support that and/or something else.

  • MASM: vminps zmm1, zmm2, DWORD bcst [rax]

  • NASM vminps zmm1, zmm2, [rax] {1to16} (optional dword or qword specifier in the usual place, like dword [rax]{1to16} NASM does not support the bcst keyword.

  • €ASM aka Euro Assembler: vminps ymm1,ymm2,[rax],Bcst=on

  • GAS/clang .intel_syntax is MASM-like in general and supports dword bcst [rax]. But also [rax]{1to16}. (objdump -drwC -Mintel uses dword bcst [rax])

  • AT&T syntax: vminps (%rax){1to16},%zmm2,%zmm1

The machine code only has 1 bit to encode broadcast vs. regular, so there's no way to broadcast 64-bit pairs of floats for vminps; the broadcast element size has to match the SIMD element size. So €ASM's minimal syntax is sufficient; the others merely provide a way for the assembler to check for a mismatch in what the human thinks the instruction will do.

Unlike embedded rounding + suppress-all-exceptions which only work with scalar (like vmulss) or 512-bit instructions<sup>1</sup>, broadcast memory operands do work with 256 and 128-bit bit vectors as well (AVX512VL).

Broadcast element sizes of 32 and 64-bit are supported; not coincidentally, those are the element sizes that load ports on Intel CPUs can do for free as part of a load uop. (Note that vpbroadcastb/w vec, [mem] need an ALU uop, vpbroadcastd/q only need the load uop.)


Footnote 1: e.g. vmulps zmm0,zmm1,zmm2{rz-sae} (GAS .intel_syntax / MASM)
or vmulps zmm0, zmm1, zmm2, {rz-sae} (NASM, with an extra comma before the {})

英文:

It varies by assembler. Some support the {1to16} / {1to8} / {1to4} syntax in slides from a 2014 talk introducing AVX-512 at a GCC conference, by Kirill Yukhin of Intel. (Despite it being a GCC talk, the slides use Intel syntax.) Others support that and/or something else.

  • MASM: vminps zmm1, zmm2, DWORD bcst [rax]

  • NASM vminps zmm1, zmm2, [rax] {1to16} (optional dword or qword specifier in the usual place, like dword [rax]{1to16} NASM does not support the bcst keyword.

  • €ASM aka Euro Assembler: vminps ymm1,ymm2,[rax],Bcst=on

  • GAS/clang .intel_syntax is MASM-like in general and supports dword bcst [rax]. But also [rax]{1to16}. (objdump -drwC -Mintel uses dword bcst [rax])

  • AT&T syntax: vminps (%rax){1to16},%zmm2,%zmm1

The machine code only has 1 bit to encode broadcast vs. regular, so there's no way to broadcast 64-bit pairs of floats for vminps; the broadcast element size has to match the SIMD element size. So €ASM's minimal syntax is sufficient; the others merely provide a way for the assembler to check for a mismatch in what the human thinks the instruction will do.

Unlike embedded rounding + suppress-all-exceptions which only work with scalar (like vmulss) or 512-bit instructions<sup>1</sup>, broadcast memory operands do work with 256 and 128-bit bit vectors as well (AVX512VL).

Broadcast element sizes of 32 and 64-bit are supported; not coincidentally, those are the element sizes that load ports on Intel CPUs can do for free as part of a load uop. (Note that vpbroadcastb/w vec, [mem] need an ALU uop, vpbroadcastd/q only need the load uop.)


Footnote 1: e.g. vmulps zmm0,zmm1,zmm2{rz-sae} (GAS .intel_syntax / MASM)
or vmulps zmm0, zmm1, zmm2, {rz-sae} (NASM, with an extra comma before the {})

答案2

得分: 1

指定的指令可以写成
VMINPS ZMM1, ZMM2, DWORD bcst [EAX]

示例可以在这里看到。

英文:

The specified instruction can be written as
VMINPS ZMM1, ZMM2, DWORD bcst [EAX]

An example can be seen here

huangapple
  • 本文由 发表于 2023年3月8日 18:29:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75671861.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定