英文:
Why is (V)SHUFPS not in Intel's constant time instruction list?
问题
今年早些时候,Intel发布了一份列表,列出了一些指令,保证在处理数据操作数时不会有时间依赖性。(最初有人认为只有在启用DOITM时,这些指令才是常量时间的,但后来澄清说,无论是否启用DOITM,这些指令始终是常量时间的。)出于好奇,我正在研究实际的加密实现与这个列表的符合程度(即只使用这个列表中的指令)。
结果发现这个列表有一些奇怪之处。它包括MOVDQU
,但不包括MOVUPS
,尽管这两个指令在功能上应该是相同的。这不是一个严重的问题:我可以简单地获取编译器的汇编输出,在汇编之前使用sed 's/movups/movdqu/g'
进行替换。
更困难的问题是,它没有包括(V)SHUFPS
,尽管它显然有很多其他的浮点洗牌指令,比如VPERMILPS/D
。SHUFPS
在BLAKE3中被使用。
是否有已知的原因,这个指令没有包含在常量时间列表中?有什么好的方法可以模拟它的功能,只使用这个列表中的指令?
英文:
Earlier this year Intel published a list of instructions that are guaranteed not to have timing dependency on its data operands. (Initially it was suggested that these are constant-time only when DOITM is enabled, but later it was clarified that these are always constant-time, regardless of DOITM.) Out of curiosity I am looking at how closely real-world crypto implementations conform to this list (i.e. only using instructions from this list).
It turns out this list has a number of oddities. It has MOVDQU
, but not MOVUPS
, even though the two should be functionally identical. This is not a serious issue: I can simply take the assembly output of the compiler, and do sed 's/movups/movdqu/g'
before assembling.
A more difficult obstacle is that it does not have (V)SHUFPS
, even though it clearly has lots of other floating point shuffling instructions like VPERMILPS/D
. SHUFPS
is used in BLAKE3.
Is there a known reason this instruction is not included on the constant-time list? What would be a good way to simulate its functionality, using only instructions from this list?
答案1
得分: 4
我找不到第一个问题的答案(为什么它不在列表中),但我有一个解决第二个问题的方法,即如何绕过这个指令。对于BLAKE3的实现,有一个有问题的代码行:
#define _mm_shuffle_ps2(a, b, c) \
(_mm_castps_si128( \
_mm_shuffle_ps(_mm_castsi128_ps(a), _mm_castsi128_ps(b), (c))))
可以使用以下替代代码:
#define _mm_shuffle_ps2(a, b, c) \
_mm_blend_epi32 (_mm_shuffle_epi32((a), (c)), _mm_shuffle_epi32((b), (c)), 0b1100)
这将导致GCC生成VPSHUFD
和VPBLENDD
指令,根据Intel的说法,这两个指令都应该是恒定时间的。
英文:
I cannot find an answer to the first question (why it is not in the list), but I have a solution to the second question, namely how to workaround this instruction. For the BLAKE3 implementation, the problematic line is
#define _mm_shuffle_ps2(a, b, c) \
(_mm_castps_si128( \
_mm_shuffle_ps(_mm_castsi128_ps(a), _mm_castsi128_ps(b), (c))))
A drop in replacement is
#define _mm_shuffle_ps2(a, b, c) \
_mm_blend_epi32 (_mm_shuffle_epi32((a), (c)), _mm_shuffle_epi32((b), (c)), 0b1100)
This causes GCC to generate VPSHUFD
and VPBLENDD
, both of which should be constant-time according to Intel.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论