ia32-64/x86/vfmadd132sh.vfnmadd132sh.vfmadd213sh.vfnmadd213sh.vfmadd231sh.vfnmadd231sh.html
2025-07-08 02:23:29 -03:00

197 lines
9.1 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>VFMADD132SH/VFNMADD132SH/VFMADD213SH/VFNMADD213SH/VFMADD231SH/VFNMADD231SH
— Fused Multiply-Add of Scalar FP16 Values</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>VFMADD132SH/VFNMADD132SH/VFMADD213SH/VFNMADD213SH/VFMADD231SH/VFNMADD231SH
— Fused Multiply-Add of Scalar FP16 Values</h1>
<table>
<tr>
<th> Instruction En Bit Mode Flag
Support Instruction En Bit Mode Flag
Support 64/32 CPUID Feature Instruction En Bit Mode Flag CPUID Feature Instruction En Bit Mode Flag Op/ 64/32 CPUID Feature Instruction En Bit Mode Flag 64/32 CPUID Feature Instruction En Bit Mode Flag CPUID Feature Instruction En Bit Mode Flag Op/ 64/32 CPUID Feature </th>
<th></th>
<th>Support</th>
<th></th>
<th>Description</th></tr>
<tr>
<td>EVEX.LLIG.66.MAP6.W0 99 /r VFMADD132SH xmm1{k1}{z}, xmm2, xmm3/m16 {er}</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16</td>
<td>Multiply FP16 values from xmm1 and xmm3/m16, add to xmm2, and store the result in xmm1.</td></tr>
<tr>
<td>EVEX.LLIG.66.MAP6.W0 A9 /r VFMADD213SH xmm1{k1}{z}, xmm2, xmm3/m16 {er}</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16</td>
<td>Multiply FP16 values from xmm1 and xmm2, add to xmm3/m16, and store the result in xmm1.</td></tr>
<tr>
<td>EVEX.LLIG.66.MAP6.W0 B9 /r VFMADD231SH xmm1{k1}{z}, xmm2, xmm3/m16 {er}</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16</td>
<td>Multiply FP16 values from xmm2 and xmm3/m16, add to xmm1, and store the result in xmm1.</td></tr>
<tr>
<td>EVEX.LLIG.66.MAP6.W0 9D /r VFNMADD132SH xmm1{k1}{z}, xmm2, xmm3/m16 {er}</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16</td>
<td>Multiply FP16 values from xmm1 and xmm3/m16, and negate the value. Add this value to xmm2, and store the result in xmm1.</td></tr>
<tr>
<td>EVEX.LLIG.66.MAP6.W0 AD /r VFNMADD213SH xmm1{k1}{z}, xmm2, xmm3/m16 {er}</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16</td>
<td>Multiply FP16 values from xmm1 and xmm2, and negate the value. Add this value to xmm3/m16, and store the result in xmm1.</td></tr>
<tr>
<td>EVEX.LLIG.66.MAP6.W0 BD /r VFNMADD231SH xmm1{k1}{z}, xmm2, xmm3/m16 {er}</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16</td>
<td>Multiply FP16 values from xmm2 and xmm3/m16, and negate the value. Add this value to xmm1, and store the result in xmm1.</td></tr></table>
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
</a></h2>
<table>
<tr>
<th>Op/En</th>
<th>Tuple</th>
<th>Operand 1</th>
<th>Operand 2</th>
<th>Operand 3</th>
<th>Operand 4</th></tr>
<tr>
<td>A</td>
<td>Scalar</td>
<td>ModRM:reg (r, w)</td>
<td>VEX.vvvv (r)</td>
<td>ModRM:r/m (r)</td>
<td>N/A</td></tr></table>
<h3 id="description">Description<a class="anchor" href="#description">
</a></h3>
<p>Performs a scalar multiply-add or negated multiply-add computation on the low FP16 values using three source operands and writes the result in the destination operand. The destination operand is also the first source operand. The “N” (negated) forms of this instruction add the negated infinite precision intermediate product to the corresponding remaining operand. The notation “132”, “213” and “231” indicate the use of the operands in ±A * B + C, where each digit corresponds to the operand number, with the destination being operand 1; see <a href='vfmadd132sh.vfnmadd132sh.vfmadd213sh.vfnmadd213sh.vfmadd231sh.vfnmadd231sh.html#tbl-5-4'>Table 5-4</a>.</p>
<p>Bits 127:16 of the destination operand are preserved. Bits MAXVL-1:128 of the destination operand are zeroed. The low FP16 element of the destination is updated according to the writemask.</p>
<figure id="tbl-5-4">
<table>
<tr>
<th>Notation</th>
<th>Operands</th></tr>
<tr>
<td>132</td>
<td>dest = ± dest*src3+src2</td></tr>
<tr>
<td>231</td>
<td>dest = ± src2*src3+dest</td></tr>
<tr>
<td>213</td>
<td>dest = ± src2*dest+src3</td></tr></table>
<figcaption><a href='vfmadd132sh.vfnmadd132sh.vfmadd213sh.vfnmadd213sh.vfmadd231sh.vfnmadd231sh.html#tbl-5-4'>Table 5-4</a>. VF[,N]MADD[132,213,231]SH Notation for Operands</figcaption></figure>
<h3 id="operation">Operation<a class="anchor" href="#operation">
</a></h3>
<h4 id="vf--n-madd132sh-dest--src2--src3--evex-encoded-versions-">VF[,N]MADD132SH DEST, SRC2, SRC3 (EVEX encoded versions)<a class="anchor" href="#vf--n-madd132sh-dest--src2--src3--evex-encoded-versions-">
</a></h4>
<pre>IF EVEX.b = 1 and SRC3 is a register:
SET_RM(EVEX.RC)
ELSE
SET_RM(MXCSR.RC)
IF k1[0] OR *no writemask*:
IF *negative form*:
DEST.fp16[0] := RoundFPControl(-DEST.fp16[0]*SRC3.fp16[0] + SRC2.fp16[0])
ELSE:
DEST.fp16[0] := RoundFPControl(DEST.fp16[0]*SRC3.fp16[0] + SRC2.fp16[0])
ELSE IF *zeroing*:
DEST.fp16[0] := 0
// else DEST.fp16[0] remains unchanged
//DEST[127:16] remains unchanged
DEST[MAXVL-1:128] := 0
</pre>
<h4 id="vf--n-madd213sh-dest--src2--src3--evex-encoded-versions-">VF[,N]MADD213SH DEST, SRC2, SRC3 (EVEX encoded versions)<a class="anchor" href="#vf--n-madd213sh-dest--src2--src3--evex-encoded-versions-">
</a></h4>
<pre>IF EVEX.b = 1 and SRC3 is a register:
SET_RM(EVEX.RC)
ELSE
SET_RM(MXCSR.RC)
IF k1[0] OR *no writemask*:
IF *negative form:
DEST.fp16[0] := RoundFPControl(-SRC2.fp16[0]*DEST.fp16[0] + SRC3.fp16[0])
ELSE:
DEST.fp16[0] := RoundFPControl(SRC2.fp16[0]*DEST.fp16[0] + SRC3.fp16[0])
ELSE IF *zeroing*:
DEST.fp16[0] := 0
// else DEST.fp16[0] remains unchanged
//DEST[127:16] remains unchanged
DEST[MAXVL-1:128] := 0
</pre>
<h4 id="vf--n-madd231sh-dest--src2--src3--evex-encoded-versions-">VF[,N]MADD231SH DEST, SRC2, SRC3 (EVEX encoded versions)<a class="anchor" href="#vf--n-madd231sh-dest--src2--src3--evex-encoded-versions-">
</a></h4>
<pre>IF EVEX.b = 1 and SRC3 is a register:
SET_RM(EVEX.RC)
ELSE
SET_RM(MXCSR.RC)
IF k1[0] OR *no writemask*:
IF *negative form*:
DEST.fp16[0] := RoundFPControl(-SRC2.fp16[0]*SRC3.fp16[0] + DEST.fp16[0])
ELSE:
DEST.fp16[0] := RoundFPControl(SRC2.fp16[0]*SRC3.fp16[0] + DEST.fp16[0])
ELSE IF *zeroing*:
DEST.fp16[0] := 0
// else DEST.fp16[0] remains unchanged
//DEST[127:16] remains unchanged
DEST[MAXVL-1:128] := 0
</pre>
<h3 id="intel-c-c++-compiler-intrinsic-equivalent">Intel C/C++ Compiler Intrinsic Equivalent<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalent">
</a></h3>
<pre>VFMADD132SH, VFMADD213SH, and VFMADD231SH: __m128h _mm_fmadd_round_sh (__m128h a, __m128h b, __m128h c, const int rounding);
</pre>
<pre>__m128h _mm_mask_fmadd_round_sh (__m128h a, __mmask8 k, __m128h b, __m128h c, const int rounding);
</pre>
<pre>__m128h _mm_mask3_fmadd_round_sh (__m128h a, __m128h b, __m128h c, __mmask8 k, const int rounding);
</pre>
<pre>__m128h _mm_maskz_fmadd_round_sh (__mmask8 k, __m128h a, __m128h b, __m128h c, const int rounding);
</pre>
<pre>__m128h _mm_fmadd_sh (__m128h a, __m128h b, __m128h c);
</pre>
<pre>__m128h _mm_mask_fmadd_sh (__m128h a, __mmask8 k, __m128h b, __m128h c);
</pre>
<pre>__m128h _mm_mask3_fmadd_sh (__m128h a, __m128h b, __m128h c, __mmask8 k);
</pre>
<pre>__m128h _mm_maskz_fmadd_sh (__mmask8 k, __m128h a, __m128h b, __m128h c);
</pre>
<pre>VFNMADD132SH, VFNMADD213SH, and VFNMADD231SH: __m128h _mm_fnmadd_round_sh (__m128h a, __m128h b, __m128h c, const int rounding);
</pre>
<pre>__m128h _mm_mask_fnmadd_round_sh (__m128h a, __mmask8 k, __m128h b, __m128h c, const int rounding);
</pre>
<pre>__m128h _mm_mask3_fnmadd_round_sh (__m128h a, __m128h b, __m128h c, __mmask8 k, const int rounding);
</pre>
<pre>__m128h _mm_maskz_fnmadd_round_sh (__mmask8 k, __m128h a, __m128h b, __m128h c, const int rounding);
</pre>
<pre>__m128h _mm_fnmadd_sh (__m128h a, __m128h b, __m128h c);
</pre>
<pre>__m128h _mm_mask_fnmadd_sh (__m128h a, __mmask8 k, __m128h b, __m128h c);
</pre>
<pre>__m128h _mm_mask3_fnmadd_sh (__m128h a, __m128h b, __m128h c, __mmask8 k);
</pre>
<pre>__m128h _mm_maskz_fnmadd_sh (__mmask8 k, __m128h a, __m128h b, __m128h c);
</pre>
<h3 class="exceptions" id="simd-floating-point-exceptions">SIMD Floating-Point Exceptions<a class="anchor" href="#simd-floating-point-exceptions">
</a></h3>
<p>Invalid, Underflow, Overflow, Precision, Denormal</p>
<h3 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
</a></h3>
<p>EVEX-encoded instructions, see <span class="not-imported">Table 2-47</span>, “Type E3 Class Exception Conditions.”</p><footer><p>
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developers Manual</a> for anything serious.
</p></footer></body></html>