forked from NRZCode/ia32-64
282 lines
13 KiB
HTML
282 lines
13 KiB
HTML
<!DOCTYPE html>
|
||
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>VFMADDSUB132PH/VFMADDSUB213PH/VFMADDSUB231PH
|
||
— Fused Multiply-AlternatingAdd/Subtract of Packed FP16 Values</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>VFMADDSUB132PH/VFMADDSUB213PH/VFMADDSUB231PH
|
||
— Fused Multiply-AlternatingAdd/Subtract of Packed FP16 Values</h1>
|
||
|
||
|
||
|
||
<table>
|
||
<tr>
|
||
<th> Instruction En Bit Mode Flag
|
||
Support Instruction En Bit Mode Flag
|
||
Support 64/32 CPUID Feature Instruction En Bit Mode Flag CPUID Feature Instruction En Bit Mode Flag Op/ 64/32 CPUID Feature Instruction En Bit Mode Flag 64/32 CPUID Feature Instruction En Bit Mode Flag CPUID Feature Instruction En Bit Mode Flag Op/ 64/32 CPUID Feature </th>
|
||
<th></th>
|
||
<th>Support</th>
|
||
<th></th>
|
||
<th>Description</th></tr>
|
||
<tr>
|
||
<td>EVEX.128.66.MAP6.W0 96 /r VFMADDSUB132PH xmm1{k1}{z}, xmm2, xmm3/m128/m16bcst</td>
|
||
<td>A</td>
|
||
<td>V/V</td>
|
||
<td>AVX512-FP16 AVX512VL</td>
|
||
<td>Multiply packed FP16 values from xmm1 and xmm3/m128/m16bcst, add/subtract elements in xmm2, and store the result in xmm1 subject to writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.256.66.MAP6.W0 96 /r VFMADDSUB132PH ymm1{k1}{z}, ymm2, ymm3/m256/m16bcst</td>
|
||
<td>A</td>
|
||
<td>V/V</td>
|
||
<td>AVX512-FP16 AVX512VL</td>
|
||
<td>Multiply packed FP16 values from ymm1 and ymm3/m256/m16bcst, add/subtract elements in ymm2, and store the result in ymm1 subject to writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.512.66.MAP6.W0 96 /r VFMADDSUB132PH zmm1{k1}{z}, zmm2, zmm3/m512/m16bcst {er}</td>
|
||
<td>A</td>
|
||
<td>V/V</td>
|
||
<td>AVX512-FP16</td>
|
||
<td>Multiply packed FP16 values from zmm1 and zmm3/m512/m16bcst, add/subtract elements in zmm2, and store the result in zmm1 subject to writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.128.66.MAP6.W0 A6 /r VFMADDSUB213PH xmm1{k1}{z}, xmm2, xmm3/m128/m16bcst</td>
|
||
<td>A</td>
|
||
<td>V/V</td>
|
||
<td>AVX512-FP16 AVX512VL</td>
|
||
<td>Multiply packed FP16 values from xmm1 and xmm2, add/subtract elements in xmm3/m128/m16bcst, and store the result in xmm1 subject to writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.256.66.MAP6.W0 A6 /r VFMADDSUB213PH ymm1{k1}{z}, ymm2, ymm3/m256/m16bcst</td>
|
||
<td>A</td>
|
||
<td>V/V</td>
|
||
<td>AVX512-FP16 AVX512VL</td>
|
||
<td>Multiply packed FP16 values from ymm1 and ymm2, add/subtract elements in ymm3/m256/m16bcst, and store the result in ymm1 subject to writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.512.66.MAP6.W0 A6 /r VFMADDSUB213PH zmm1{k1}{z}, zmm2, zmm3/m512/m16bcst {er}</td>
|
||
<td>A</td>
|
||
<td>V/V</td>
|
||
<td>AVX512-FP16</td>
|
||
<td>Multiply packed FP16 values from zmm1 and zmm2, add/subtract elements in zmm3/m512/m16bcst, and store the result in zmm1 subject to writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.128.66.MAP6.W0 B6 /r VFMADDSUB231PH xmm1{k1}{z}, xmm2, xmm3/m128/m16bcst</td>
|
||
<td>A</td>
|
||
<td>V/V</td>
|
||
<td>AVX512-FP16 AVX512VL</td>
|
||
<td>Multiply packed FP16 values from xmm2 and xmm3/m128/m16bcst, add/subtract elements in xmm1, and store the result in xmm1 subject to writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.256.66.MAP6.W0 B6 /r VFMADDSUB231PH ymm1{k1}{z}, ymm2, ymm3/m256/m16bcst</td>
|
||
<td>A</td>
|
||
<td>V/V</td>
|
||
<td>AVX512-FP16 AVX512VL</td>
|
||
<td>Multiply packed FP16 values from ymm2 and ymm3/m256/m16bcst, add/subtract elements in ymm1, and store the result in ymm1 subject to writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.512.66.MAP6.W0 B6 /r VFMADDSUB231PH zmm1{k1}{z}, zmm2, zmm3/m512/m16bcst {er}</td>
|
||
<td>A</td>
|
||
<td>V/V</td>
|
||
<td>AVX512-FP16</td>
|
||
<td>Multiply packed FP16 values from zmm2 and zmm3/m512/m16bcst, add/subtract elements in zmm1, and store the result in zmm1 subject to writemask k1.</td></tr></table>
|
||
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
|
||
¶
|
||
</a></h2>
|
||
<table>
|
||
<tr>
|
||
<th>Op/En</th>
|
||
<th>Tuple</th>
|
||
<th>Operand 1</th>
|
||
<th>Operand 2</th>
|
||
<th>Operand 3</th>
|
||
<th>Operand 4</th></tr>
|
||
<tr>
|
||
<td>A</td>
|
||
<td>Full</td>
|
||
<td>ModRM:reg (r, w)</td>
|
||
<td>VEX.vvvv (r)</td>
|
||
<td>ModRM:r/m (r)</td>
|
||
<td>N/A</td></tr></table>
|
||
<h3 id="description">Description<a class="anchor" href="#description">
|
||
¶
|
||
</a></h3>
|
||
<p>This instruction performs a packed multiply-add (odd elements) or multiply-subtract (even elements) computation on FP16 values using three source operands and writes the results in the destination operand. The destination operand is also the first source operand. The notation’ “132”, “213” and “231” indicate the use of the operands in A * B ± C, where each digit corresponds to the operand number, with the destination being operand 1; see <a href='vfmsubadd132ph.vfmsubadd213ph.vfmsubadd231ph.html#tbl-5-8'>Table 5-8</a>.</p>
|
||
<p>The destination elements are updated according to the writemask.</p>
|
||
<figure id="tbl-5-5">
|
||
<table>
|
||
<tr>
|
||
<th>Notation</th>
|
||
<th>Odd Elements</th>
|
||
<th>Even Elements</th></tr>
|
||
<tr>
|
||
<td>132</td>
|
||
<td>dest = dest*src3+src2</td>
|
||
<td>dest = dest*src3-src2</td></tr>
|
||
<tr>
|
||
<td>231</td>
|
||
<td>dest = src2*src3+dest</td>
|
||
<td>dest = src2*src3-dest</td></tr>
|
||
<tr>
|
||
<td>213</td>
|
||
<td>dest = src2*dest+src3</td>
|
||
<td>dest = src2*dest-src3</td></tr></table>
|
||
<figcaption><a href='vfmaddsub132ph.vfmaddsub213ph.vfmaddsub231ph.html#tbl-5-5'>Table 5-5</a>. VFMADDSUB[132,213,231]PH Notation for Odd and Even Elements</figcaption></figure>
|
||
<h3 id="operation">Operation<a class="anchor" href="#operation">
|
||
¶
|
||
</a></h3>
|
||
<h4 id="vfmaddsub132ph-dest--src2--src3--evex-encoded-versions--when-src3-operand-is-a-register">VFMADDSUB132PH DEST, SRC2, SRC3 (EVEX encoded versions) when src3 operand is a register<a class="anchor" href="#vfmaddsub132ph-dest--src2--src3--evex-encoded-versions--when-src3-operand-is-a-register">
|
||
¶
|
||
</a></h4>
|
||
<pre>VL = 128, 256 or 512
|
||
KL := VL/16
|
||
IF (VL = 512) AND (EVEX.b = 1):
|
||
SET_RM(EVEX.RC)
|
||
ELSE
|
||
SET_RM(MXCSR.RC)
|
||
FOR j := 0 TO KL-1:
|
||
IF k1[j] OR *no writemask*:
|
||
IF *j is even*:
|
||
DEST.fp16[j] := RoundFPControl(DEST.fp16[j] * SRC3.fp16[j] - SRC2.fp16[j])
|
||
ELSE:
|
||
DEST.fp16[j] := RoundFPControl(DEST.fp16[j] * SRC3.fp16[j] + SRC2.fp16[j])
|
||
ELSE IF *zeroing*:
|
||
DEST.fp16[j] := 0
|
||
// else dest.fp16[j] remains unchanged
|
||
DEST[MAXVL-1:VL] := 0
|
||
</pre>
|
||
<h4 id="vfmaddsub132ph-dest--src2--src3--evex-encoded-versions--when-src3-operand-is-a-memory-source">VFMADDSUB132PH DEST, SRC2, SRC3 (EVEX encoded versions) when src3 operand is a memory source<a class="anchor" href="#vfmaddsub132ph-dest--src2--src3--evex-encoded-versions--when-src3-operand-is-a-memory-source">
|
||
¶
|
||
</a></h4>
|
||
<pre>VL = 128, 256 or 512
|
||
KL := VL/16
|
||
FOR j := 0 TO KL-1:
|
||
IF k1[j] OR *no writemask*:
|
||
IF EVEX.b = 1:
|
||
t3 := SRC3.fp16[0]
|
||
ELSE:
|
||
t3 := SRC3.fp16[j]
|
||
IF *j is even*:
|
||
DEST.fp16[j] := RoundFPControl(DEST.fp16[j] * t3 - SRC2.fp16[j])
|
||
ELSE:
|
||
DEST.fp16[j] := RoundFPControl(DEST.fp16[j] * t3 + SRC2.fp16[j])
|
||
ELSE IF *zeroing*:
|
||
DEST.fp16[j] := 0
|
||
// else dest.fp16[j] remains unchanged
|
||
DEST[MAXVL-1:VL] := 0
|
||
</pre>
|
||
<h4 id="vfmaddsub213ph-dest--src2--src3--evex-encoded-versions--when-src3-operand-is-a-register">VFMADDSUB213PH DEST, SRC2, SRC3 (EVEX encoded versions) when src3 operand is a register<a class="anchor" href="#vfmaddsub213ph-dest--src2--src3--evex-encoded-versions--when-src3-operand-is-a-register">
|
||
¶
|
||
</a></h4>
|
||
<pre>VL = 128, 256 or 512
|
||
KL := VL/16
|
||
IF (VL = 512) AND (EVEX.b = 1):
|
||
SET_RM(EVEX.RC)
|
||
ELSE
|
||
SET_RM(MXCSR.RC)
|
||
FOR j := 0 TO KL-1:
|
||
IF k1[j] OR *no writemask*:
|
||
IF *j is even*:
|
||
DEST.fp16[j] := RoundFPControl(SRC2.fp16[j]*DEST.fp16[j] - SRC3.fp16[j])
|
||
ELSE
|
||
DEST.fp16[j] := RoundFPControl(SRC2.fp16[j]*DEST.fp16[j] + SRC3.fp16[j])
|
||
ELSE IF *zeroing*:
|
||
DEST.fp16[j] := 0
|
||
// else dest.fp16[j] remains unchanged
|
||
DEST[MAXVL-1:VL] := 0
|
||
</pre>
|
||
<h4 id="vfmaddsub213ph-dest--src2--src3--evex-encoded-versions--when-src3-operand-is-a-memory-source">VFMADDSUB213PH DEST, SRC2, SRC3 (EVEX encoded versions) when src3 operand is a memory source<a class="anchor" href="#vfmaddsub213ph-dest--src2--src3--evex-encoded-versions--when-src3-operand-is-a-memory-source">
|
||
¶
|
||
</a></h4>
|
||
<pre>VL = 128, 256 or 512
|
||
KL := VL/16
|
||
FOR j := 0 TO KL-1:
|
||
IF k1[j] OR *no writemask*:
|
||
IF EVEX.b = 1:
|
||
t3 := SRC3.fp16[0]
|
||
ELSE:
|
||
t3 := SRC3.fp16[j]
|
||
IF *j is even*:
|
||
DEST.fp16[j] := RoundFPControl(SRC2.fp16[j] * DEST.fp16[j] - t3)
|
||
ELSE:
|
||
DEST.fp16[j] := RoundFPControl(SRC2.fp16[j] * DEST.fp16[j] + t3)
|
||
ELSE IF *zeroing*:
|
||
DEST.fp16[j] := 0
|
||
// else dest.fp16[j] remains unchanged
|
||
DEST[MAXVL-1:VL] := 0
|
||
</pre>
|
||
<h4 id="vfmaddsub231ph-dest--src2--src3--evex-encoded-versions--when-src3-operand-is-a-register">VFMADDSUB231PH DEST, SRC2, SRC3 (EVEX encoded versions) when src3 operand is a register<a class="anchor" href="#vfmaddsub231ph-dest--src2--src3--evex-encoded-versions--when-src3-operand-is-a-register">
|
||
¶
|
||
</a></h4>
|
||
<pre>VL = 128, 256 or 512
|
||
KL := VL/16
|
||
IF (VL = 512) AND (EVEX.b = 1):
|
||
SET_RM(EVEX.RC)
|
||
ELSE
|
||
SET_RM(MXCSR.RC)
|
||
FOR j := 0 TO KL-1:
|
||
IF k1[j] OR *no writemask*:
|
||
IF *j is even:
|
||
DEST.fp16[j] := RoundFPControl(SRC2.fp16[j] * SRC3.fp16[j] - DEST.fp16[j])
|
||
ELSE:
|
||
DEST.fp16[j] := RoundFPControl(SRC2.fp16[j] * SRC3.fp16[j] + DEST.fp16[j])
|
||
ELSE IF *zeroing*:
|
||
DEST.fp16[j] := 0
|
||
// else dest.fp16[j] remains unchanged
|
||
DEST[MAXVL-1:VL] := 0
|
||
</pre>
|
||
<h4 id="vfmaddsub231ph-dest--src2--src3--evex-encoded-versions--when-src3-operand-is-a-memory-source">VFMADDSUB231PH DEST, SRC2, SRC3 (EVEX encoded versions) when src3 operand is a memory source<a class="anchor" href="#vfmaddsub231ph-dest--src2--src3--evex-encoded-versions--when-src3-operand-is-a-memory-source">
|
||
¶
|
||
</a></h4>
|
||
<pre>VL = 128, 256 or 512
|
||
KL := VL/16
|
||
FOR j := 0 TO KL-1:
|
||
IF k1[j] OR *no writemask*:
|
||
IF EVEX.b = 1:
|
||
t3 := SRC3.fp16[0]
|
||
ELSE:
|
||
t3 := SRC3.fp16[j]
|
||
IF *j is even*:
|
||
DEST.fp16[j] := RoundFPControl(SRC2.fp16[j] * t3 - DEST.fp16[j])
|
||
ELSE:
|
||
DEST.fp16[j] := RoundFPControl(SRC2.fp16[j] * t3 + DEST.fp16[j])
|
||
ELSE IF *zeroing*:
|
||
DEST.fp16[j] := 0
|
||
// else dest.fp16[j] remains unchanged
|
||
DEST[MAXVL-1:VL] := 0
|
||
</pre>
|
||
<h3 id="intel-c-c++-compiler-intrinsic-equivalent">Intel C/C++ Compiler Intrinsic Equivalent<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalent">
|
||
¶
|
||
</a></h3>
|
||
<pre>VFMADDSUB132PH, VFMADDSUB213PH, and VFMADDSUB231PH: __m128h _mm_fmaddsub_ph (__m128h a, __m128h b, __m128h c);
|
||
</pre>
|
||
<pre>__m128h _mm_mask_fmaddsub_ph (__m128h a, __mmask8 k, __m128h b, __m128h c);
|
||
</pre>
|
||
<pre>__m128h _mm_mask3_fmaddsub_ph (__m128h a, __m128h b, __m128h c, __mmask8 k);
|
||
</pre>
|
||
<pre>__m128h _mm_maskz_fmaddsub_ph (__mmask8 k, __m128h a, __m128h b, __m128h c);
|
||
</pre>
|
||
<pre>__m256h _mm256_fmaddsub_ph (__m256h a, __m256h b, __m256h c);
|
||
</pre>
|
||
<pre>__m256h _mm256_mask_fmaddsub_ph (__m256h a, __mmask16 k, __m256h b, __m256h c);
|
||
</pre>
|
||
<pre>__m256h _mm256_mask3_fmaddsub_ph (__m256h a, __m256h b, __m256h c, __mmask16 k);
|
||
</pre>
|
||
<pre>__m256h _mm256_maskz_fmaddsub_ph (__mmask16 k, __m256h a, __m256h b, __m256h c);
|
||
</pre>
|
||
<pre>__m512h _mm512_fmaddsub_ph (__m512h a, __m512h b, __m512h c);
|
||
</pre>
|
||
<pre>__m512h _mm512_mask_fmaddsub_ph (__m512h a, __mmask32 k, __m512h b, __m512h c);
|
||
</pre>
|
||
<pre>__m512h _mm512_mask3_fmaddsub_ph (__m512h a, __m512h b, __m512h c, __mmask32 k);
|
||
</pre>
|
||
<pre>__m512h _mm512_maskz_fmaddsub_ph (__mmask32 k, __m512h a, __m512h b, __m512h c);
|
||
</pre>
|
||
<pre>__m512h _mm512_fmaddsub_round_ph (__m512h a, __m512h b, __m512h c, const int rounding);
|
||
</pre>
|
||
<pre>__m512h _mm512_mask_fmaddsub_round_ph (__m512h a, __mmask32 k, __m512h b, __m512h c, const int rounding);
|
||
</pre>
|
||
<pre>__m512h _mm512_mask3_fmaddsub_round_ph (__m512h a, __m512h b, __m512h c, __mmask32 k, const int rounding);
|
||
</pre>
|
||
<pre>__m512h _mm512_maskz_fmaddsub_round_ph (__mmask32 k, __m512h a, __m512h b, __m512h c, const int rounding);
|
||
</pre>
|
||
<h3 class="exceptions" id="simd-floating-point-exceptions">SIMD Floating-Point Exceptions<a class="anchor" href="#simd-floating-point-exceptions">
|
||
¶
|
||
</a></h3>
|
||
<p>Invalid, Underflow, Overflow, Precision, Denormal.</p>
|
||
<h3 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
|
||
¶
|
||
</a></h3>
|
||
<p>EVEX-encoded instructions, see <span class="not-imported">Table 2-46</span>, “Type E2 Class Exception Conditions.”</p><footer><p>
|
||
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
|
||
inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
|
||
ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developer’s Manual</a> for anything serious.
|
||
</p></footer></body></html>
|