为什么(V)SHUFPS不在英特尔的常数时间指令列表中?

huangapple go评论75阅读模式
英文:

Why is (V)SHUFPS not in Intel's constant time instruction list?

问题

今年早些时候,Intel发布了一份列表,列出了一些指令,保证在处理数据操作数时不会有时间依赖性。(最初有人认为只有在启用DOITM时,这些指令才是常量时间的,但后来澄清说,无论是否启用DOITM,这些指令始终是常量时间的。)出于好奇,我正在研究实际的加密实现与这个列表的符合程度(即只使用这个列表中的指令)。

结果发现这个列表有一些奇怪之处。它包括MOVDQU,但不包括MOVUPS,尽管这两个指令在功能上应该是相同的。这不是一个严重的问题:我可以简单地获取编译器的汇编输出,在汇编之前使用sed 's/movups/movdqu/g'进行替换。

更困难的问题是,它没有包括(V)SHUFPS,尽管它显然有很多其他的浮点洗牌指令,比如VPERMILPS/DSHUFPS在BLAKE3中被使用

是否有已知的原因,这个指令没有包含在常量时间列表中?有什么好的方法可以模拟它的功能,只使用这个列表中的指令?

英文:

Earlier this year Intel published a list of instructions that are guaranteed not to have timing dependency on its data operands. (Initially it was suggested that these are constant-time only when DOITM is enabled, but later it was clarified that these are always constant-time, regardless of DOITM.) Out of curiosity I am looking at how closely real-world crypto implementations conform to this list (i.e. only using instructions from this list).

It turns out this list has a number of oddities. It has MOVDQU, but not MOVUPS, even though the two should be functionally identical. This is not a serious issue: I can simply take the assembly output of the compiler, and do sed 's/movups/movdqu/g' before assembling.

A more difficult obstacle is that it does not have (V)SHUFPS, even though it clearly has lots of other floating point shuffling instructions like VPERMILPS/D. SHUFPS is used in BLAKE3.

Is there a known reason this instruction is not included on the constant-time list? What would be a good way to simulate its functionality, using only instructions from this list?

答案1

得分: 4

我找不到第一个问题的答案(为什么它不在列表中),但我有一个解决第二个问题的方法,即如何绕过这个指令。对于BLAKE3的实现,有一个有问题的代码行:

#define _mm_shuffle_ps2(a, b, c)                                               \
  (_mm_castps_si128(                                                           \
      _mm_shuffle_ps(_mm_castsi128_ps(a), _mm_castsi128_ps(b), (c))))

可以使用以下替代代码:

#define _mm_shuffle_ps2(a, b, c) \
      _mm_blend_epi32 (_mm_shuffle_epi32((a), (c)), _mm_shuffle_epi32((b), (c)), 0b1100)

这将导致GCC生成VPSHUFDVPBLENDD指令,根据Intel的说法,这两个指令都应该是恒定时间的。

英文:

I cannot find an answer to the first question (why it is not in the list), but I have a solution to the second question, namely how to workaround this instruction. For the BLAKE3 implementation, the problematic line is

#define _mm_shuffle_ps2(a, b, c)                                               \
  (_mm_castps_si128(                                                           \
      _mm_shuffle_ps(_mm_castsi128_ps(a), _mm_castsi128_ps(b), (c))))

A drop in replacement is

#define _mm_shuffle_ps2(a, b, c) \
      _mm_blend_epi32 (_mm_shuffle_epi32((a), (c)), _mm_shuffle_epi32((b), (c)), 0b1100)

This causes GCC to generate VPSHUFD and VPBLENDD, both of which should be constant-time according to Intel.

huangapple
  • 本文由 发表于 2023年7月27日 14:45:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76777113.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定