ia32-64/x86/vfmadd132pd.vfmadd213pd.vfmadd231pd.html
2025-07-08 02:23:29 -03:00

400 lines
18 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>VFMADD132PD/VFMADD213PD/VFMADD231PD
— Fused Multiply-Add of Packed DoublePrecision Floating-Point Values</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>VFMADD132PD/VFMADD213PD/VFMADD231PD
— Fused Multiply-Add of Packed DoublePrecision Floating-Point Values</h1>
<table>
<tr>
<th>Opcode/Instruction</th>
<th>Op/En</th>
<th>64/32 Bit Mode Support</th>
<th>CPUID Feature Flag</th>
<th>Description</th></tr>
<tr>
<td>VEX.128.66.0F38.W1 98 /r VFMADD132PD xmm1, xmm2, xmm3/m128</td>
<td>A</td>
<td>V/V</td>
<td>FMA</td>
<td>Multiply packed double precision floating-point values from xmm1 and xmm3/mem, add to xmm2 and put result in xmm1.</td></tr>
<tr>
<td>VEX.128.66.0F38.W1 A8 /r VFMADD213PD xmm1, xmm2, xmm3/m128</td>
<td>A</td>
<td>V/V</td>
<td>FMA</td>
<td>Multiply packed double precision floating-point values from xmm1 and xmm2, add to xmm3/mem and put result in xmm1.</td></tr>
<tr>
<td>VEX.128.66.0F38.W1 B8 /r VFMADD231PD xmm1, xmm2, xmm3/m128</td>
<td>A</td>
<td>V/V</td>
<td>FMA</td>
<td>Multiply packed double precision floating-point values from xmm2 and xmm3/mem, add to xmm1 and put result in xmm1.</td></tr>
<tr>
<td>VEX.256.66.0F38.W1 98 /r VFMADD132PD ymm1, ymm2, ymm3/m256</td>
<td>A</td>
<td>V/V</td>
<td>FMA</td>
<td>Multiply packed double precision floating-point values from ymm1 and ymm3/mem, add to ymm2 and put result in ymm1.</td></tr>
<tr>
<td>VEX.256.66.0F38.W1 A8 /r VFMADD213PD ymm1, ymm2, ymm3/m256</td>
<td>A</td>
<td>V/V</td>
<td>FMA</td>
<td>Multiply packed double precision floating-point values from ymm1 and ymm2, add to ymm3/mem and put result in ymm1.</td></tr>
<tr>
<td>VEX.256.66.0F38.W1 B8 /r VFMADD231PD ymm1, ymm2, ymm3/m256</td>
<td>A</td>
<td>V/V</td>
<td>FMA</td>
<td>Multiply packed double precision floating-point values from ymm2 and ymm3/mem, add to ymm1 and put result in ymm1.</td></tr>
<tr>
<td>EVEX.128.66.0F38.W1 98 /r VFMADD132PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst</td>
<td>B</td>
<td>V/V</td>
<td>AVX512VL AVX512F</td>
<td>Multiply packed double precision floating-point values from xmm1 and xmm3/m128/m64bcst, add to xmm2 and put result in xmm1.</td></tr>
<tr>
<td>EVEX.128.66.0F38.W1 A8 /r VFMADD213PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst</td>
<td>B</td>
<td>V/V</td>
<td>AVX512VL AVX512F</td>
<td>Multiply packed double precision floating-point values from xmm1 and xmm2, add to xmm3/m128/m64bcst and put result in xmm1.</td></tr>
<tr>
<td>EVEX.128.66.0F38.W1 B8 /r VFMADD231PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst</td>
<td>B</td>
<td>V/V</td>
<td>AVX512VL AVX512F</td>
<td>Multiply packed double precision floating-point values from xmm2 and xmm3/m128/m64bcst, add to xmm1 and put result in xmm1.</td></tr>
<tr>
<td>EVEX.256.66.0F38.W1 98 /r VFMADD132PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst</td>
<td>B</td>
<td>V/V</td>
<td>AVX512VL AVX512F</td>
<td>Multiply packed double precision floating-point values from ymm1 and ymm3/m256/m64bcst, add to ymm2 and put result in ymm1.</td></tr>
<tr>
<td>EVEX.256.66.0F38.W1 A8 /r VFMADD213PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst</td>
<td>B</td>
<td>V/V</td>
<td>AVX512VL AVX512F</td>
<td>Multiply packed double precision floating-point values from ymm1 and ymm2, add to ymm3/m256/m64bcst and put result in ymm1.</td></tr>
<tr>
<td>EVEX.256.66.0F38.W1 B8 /r VFMADD231PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst</td>
<td>B</td>
<td>V/V</td>
<td>AVX512VL AVX512F</td>
<td>Multiply packed double precision floating-point values from ymm2 and ymm3/m256/m64bcst, add to ymm1 and put result in ymm1.</td></tr>
<tr>
<td>EVEX.512.66.0F38.W1 98 /r VFMADD132PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er}</td>
<td>B</td>
<td>V/V</td>
<td>AVX512F</td>
<td>Multiply packed double precision floating-point values from zmm1 and zmm3/m512/m64bcst, add to zmm2 and put result in zmm1.</td></tr>
<tr>
<td>EVEX.512.66.0F38.W1 A8 /r VFMADD213PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er}</td>
<td>B</td>
<td>V/V</td>
<td>AVX512F</td>
<td>Multiply packed double precision floating-point values from zmm1 and zmm2, add to zmm3/m512/m64bcst and put result in zmm1.</td></tr>
<tr>
<td>EVEX.512.66.0F38.W1 B8 /r VFMADD231PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er}</td>
<td>B</td>
<td>V/V</td>
<td>AVX512F</td>
<td>Multiply packed double precision floating-point values from zmm2 and zmm3/m512/m64bcst, add to zmm1 and put result in zmm1.</td></tr></table>
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
</a></h2>
<table>
<tr>
<th>Op/En</th>
<th>Tuple Type</th>
<th>Operand 1</th>
<th>Operand 2</th>
<th>Operand 3</th>
<th>Operand 4</th></tr>
<tr>
<td>A</td>
<td>N/A</td>
<td>ModRM:reg (r, w)</td>
<td>VEX.vvvv (r)</td>
<td>ModRM:r/m (r)</td>
<td>N/A</td></tr>
<tr>
<td>B</td>
<td>Full</td>
<td>ModRM:reg (r, w)</td>
<td>EVEX.vvvv (r)</td>
<td>ModRM:r/m (r)</td>
<td>N/A</td></tr></table>
<h3 id="description">Description<a class="anchor" href="#description">
</a></h3>
<p>Performs a set of SIMD multiply-add computation on packed double precision floating-point values using three source operands and writes the multiply-add results in the destination operand. The destination operand is also the first source operand. The second operand must be a SIMD register. The third source operand can be a SIMD register or a memory location.</p>
<p>VFMADD132PD: Multiplies the two, four or eight packed double precision floating-point values from the first source operand to the two, four or eight packed double precision floating-point values in the third source operand, adds the infinite precision intermediate result to the two, four or eight packed double precision floating-point values in the second source operand, performs rounding and stores the resulting two, four or eight packed double precision floating-point values to the destination operand (first source operand).</p>
<p>VFMADD213PD: Multiplies the two, four or eight packed double precision floating-point values from the second source operand to the two, four or eight packed double precision floating-point values in the first source operand, adds the infinite precision intermediate result to the two, four or eight packed double precision floating-point values in the third source operand, performs rounding and stores the resulting two, four or eight packed double precision floating-point values to the destination operand (first source operand).</p>
<p>VFMADD231PD: Multiplies the two, four or eight packed double precision floating-point values from the second source to the two, four or eight packed double precision floating-point values in the third source operand, adds the infinite precision intermediate result to the two, four or eight packed double precision floating-point values in the first source operand, performs rounding and stores the resulting two, four or eight packed double precision floating-point values to the destination operand (first source operand).</p>
<p>EVEX encoded versions: The destination operand (also first source operand) is a ZMM register and encoded in reg_field. The second source operand is a ZMM register and encoded in EVEX.vvvv. The third source operand is a ZMM register, a 512-bit memory location, or a 512-bit vector broadcasted from a 64-bit memory location. The destination operand is conditionally updated with write mask k1.</p>
<p>VEX.256 encoded version: The destination operand (also first source operand) is a YMM register and encoded in reg_field. The second source operand is a YMM register and encoded in VEX.vvvv. The third source operand is a YMM register or a 256-bit memory location and encoded in rm_field.</p>
<p>VEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 128-bit memory location and encoded in rm_field. The upper 128 bits of the YMM destination register are zeroed.</p>
<h3 id="operation">Operation<a class="anchor" href="#operation">
</a></h3>
<pre>In the operations below, “*” and “+” symbols represent multiplication and addition with infinite precision inputs and outputs (no
rounding).
</pre>
<h4 id="vfmadd132pd-dest--src2--src3--vex-encoded-version-">VFMADD132PD DEST, SRC2, SRC3 (VEX encoded version)<a class="anchor" href="#vfmadd132pd-dest--src2--src3--vex-encoded-version-">
</a></h4>
<pre>IF (VEX.128) THEN
MAXNUM := 2
ELSEIF (VEX.256)
MAXNUM := 4
FI
For i = 0 to MAXNUM-1 {
n := 64*i;
DEST[n+63:n] := RoundFPControl_MXCSR(DEST[n+63:n]*SRC3[n+63:n] + SRC2[n+63:n])
}
IF (VEX.128) THEN
DEST[MAXVL-1:128] := 0
ELSEIF (VEX.256)
DEST[MAXVL-1:256] := 0
FI
</pre>
<h4 id="vfmadd213pd-dest--src2--src3--vex-encoded-version-">VFMADD213PD DEST, SRC2, SRC3 (VEX encoded version)<a class="anchor" href="#vfmadd213pd-dest--src2--src3--vex-encoded-version-">
</a></h4>
<pre>IF (VEX.128) THEN
MAXNUM := 2
ELSEIF (VEX.256)
MAXNUM := 4
FI
For i = 0 to MAXNUM-1 {
n := 64*i;
DEST[n+63:n] := RoundFPControl_MXCSR(SRC2[n+63:n]*DEST[n+63:n] + SRC3[n+63:n])
}
IF (VEX.128) THEN
DEST[MAXVL-1:128] := 0
ELSEIF (VEX.256)
DEST[MAXVL-1:256] := 0
FI
</pre>
<h4 id="vfmadd231pd-dest--src2--src3--vex-encoded-version-">VFMADD231PD DEST, SRC2, SRC3 (VEX encoded version)<a class="anchor" href="#vfmadd231pd-dest--src2--src3--vex-encoded-version-">
</a></h4>
<pre>IF (VEX.128) THEN
MAXNUM := 2
ELSEIF (VEX.256)
MAXNUM := 4
FI
For i = 0 to MAXNUM-1 {
n := 64*i;
DEST[n+63:n] := RoundFPControl_MXCSR(SRC2[n+63:n]*SRC3[n+63:n] + DEST[n+63:n])
}
IF (VEX.128) THEN
DEST[MAXVL-1:128] := 0
ELSEIF (VEX.256)
DEST[MAXVL-1:256] := 0
FI
</pre>
<h4 id="vfmadd132pd-dest--src2--src3--evex-encoded-version--when-src3-operand-is-a-register-">VFMADD132PD DEST, SRC2, SRC3 (EVEX encoded version, when src3 operand is a register)<a class="anchor" href="#vfmadd132pd-dest--src2--src3--evex-encoded-version--when-src3-operand-is-a-register-">
</a></h4>
<pre>(KL, VL) = (2, 128), (4, 256), (8, 512)
IF (VL = 512) AND (EVEX.b = 1)
THEN
SET_ROUNDING_MODE_FOR_THIS_INSTRUCTION(EVEX.RC);
ELSE
SET_ROUNDING_MODE_FOR_THIS_INSTRUCTION(MXCSR.RC);
FI;
FOR j := 0 TO KL-1
i := j * 64
IF k1[j] OR *no writemask*
THEN DEST[i+63:i] :=
RoundFPControl(DEST[i+63:i]*SRC3[i+63:i] + SRC2[i+63:i])
ELSE
IF *merging-masking* ; merging-masking
THEN *DEST[i+63:i] remains unchanged*
ELSE ; zeroing-masking
DEST[i+63:i] := 0
FI
FI;
ENDFOR
DEST[MAXVL-1:VL] := 0
</pre>
<h4 id="vfmadd132pd-dest--src2--src3--evex-encoded-version--when-src3-operand-is-a-memory-source-">VFMADD132PD DEST, SRC2, SRC3 (EVEX encoded version, when src3 operand is a memory source)<a class="anchor" href="#vfmadd132pd-dest--src2--src3--evex-encoded-version--when-src3-operand-is-a-memory-source-">
</a></h4>
<pre>(KL, VL) = (2, 128), (4, 256), (8, 512)
FOR j := 0 TO KL-1
i := j * 64
IF k1[j] OR *no writemask*
THEN
IF (EVEX.b = 1)
THEN
DEST[i+63:i] :=
RoundFPControl_MXCSR(DEST[i+63:i]*SRC3[63:0] + SRC2[i+63:i])
ELSE
DEST[i+63:i] :=
RoundFPControl_MXCSR(DEST[i+63:i]*SRC3[i+63:i] + SRC2[i+63:i])
FI;
ELSE
IF *merging-masking* ; merging-masking
THEN *DEST[i+63:i] remains unchanged*
ELSE ; zeroing-masking
DEST[i+63:i] := 0
FI
FI;
ENDFOR
DEST[MAXVL-1:VL] := 0
</pre>
<h4 id="vfmadd213pd-dest--src2--src3--evex-encoded-version--when-src3-operand-is-a-is-a-register-">VFMADD213PD DEST, SRC2, SRC3 (EVEX encoded version, when src3 operand is a is a register)<a class="anchor" href="#vfmadd213pd-dest--src2--src3--evex-encoded-version--when-src3-operand-is-a-is-a-register-">
</a></h4>
<pre>(KL, VL) = (2, 128), (4, 256), (8, 512)
IF (VL = 512) AND (EVEX.b = 1)
THEN
SET_ROUNDING_MODE_FOR_THIS_INSTRUCTION(EVEX.RC);
ELSE
SET_ROUNDING_MODE_FOR_THIS_INSTRUCTION(MXCSR.RC);
FI;
FOR j := 0 TO KL-1
i := j * 64
IF k1[j] OR *no writemask*
THEN DEST[i+63:i] :=
RoundFPControl(SRC2[i+63:i]*DEST[i+63:i] + SRC3[i+63:i])
ELSE
IF *merging-masking* ; merging-masking
THEN *DEST[i+63:i] remains unchanged*
ELSE ; zeroing-masking
DEST[i+63:i] := 0
FI
FI;
ENDFOR
DEST[MAXVL-1:VL] := 0
</pre>
<h4 id="vfmadd213pd-dest--src2--src3--evex-encoded-version--when-src3-operand-is-a-memory-source-">VFMADD213PD DEST, SRC2, SRC3 (EVEX encoded version, when src3 operand is a memory source)<a class="anchor" href="#vfmadd213pd-dest--src2--src3--evex-encoded-version--when-src3-operand-is-a-memory-source-">
</a></h4>
<pre>(KL, VL) = (2, 128), (4, 256), (8, 512)
FOR j := 0 TO KL-1
i := j * 64
IF k1[j] OR *no writemask*
THEN
IF (EVEX.b = 1)
THEN
DEST[i+63:i] :=
RoundFPControl_MXCSR(SRC2[i+63:i]*DEST[i+63:i] + SRC3[63:0])
ELSE
DEST[i+63:i] :=
RoundFPControl_MXCSR(SRC2[i+63:i]*DEST[i+63:i] + SRC3[i+63:i])
FI;
ELSE
IF *merging-masking* ; merging-masking
THEN *DEST[i+63:i] remains unchanged*
ELSE ; zeroing-masking
DEST[i+63:i] := 0
FI
FI;
ENDFOR
DEST[MAXVL-1:VL] := 0
</pre>
<h4 id="vfmadd231pd-dest--src2--src3--evex-encoded-version--when-src3-operand-is-a-register-">VFMADD231PD DEST, SRC2, SRC3 (EVEX encoded version, when src3 operand is a register)<a class="anchor" href="#vfmadd231pd-dest--src2--src3--evex-encoded-version--when-src3-operand-is-a-register-">
</a></h4>
<pre>(KL, VL) = (2, 128), (4, 256), (8, 512)
IF (VL = 512) AND (EVEX.b = 1)
THEN
SET_ROUNDING_MODE_FOR_THIS_INSTRUCTION(EVEX.RC);
ELSE
SET_ROUNDING_MODE_FOR_THIS_INSTRUCTION(MXCSR.RC);
FI;
FOR j := 0 TO KL-1
i := j * 64
IF k1[j] OR *no writemask*
THEN DEST[i+63:i] :=
RoundFPControl(SRC2[i+63:i]*SRC3[i+63:i] + DEST[i+63:i])
ELSE
IF *merging-masking* ; merging-masking
THEN *DEST[i+63:i] remains unchanged*
ELSE ; zeroing-masking
DEST[i+63:i] := 0
FI
FI;
ENDFOR
DEST[MAXVL-1:VL] := 0
</pre>
<h4 id="vfmadd231pd-dest--src2--src3--evex-encoded-version--when-src3-operand-is-a-memory-source-">VFMADD231PD DEST, SRC2, SRC3 (EVEX encoded version, when src3 operand is a memory source)<a class="anchor" href="#vfmadd231pd-dest--src2--src3--evex-encoded-version--when-src3-operand-is-a-memory-source-">
</a></h4>
<pre>(KL, VL) = (2, 128), (4, 256), (8, 512)
FOR j := 0 TO KL-1
i := j * 64
IF k1[j] OR *no writemask*
THEN
IF (EVEX.b = 1)
THEN
DEST[i+63:i] :=
RoundFPControl_MXCSR(SRC2[i+63:i]*SRC3[63:0] + DEST[i+63:i])
ELSE
DEST[i+63:i] :=
RoundFPControl_MXCSR(SRC2[i+63:i]*SRC3[i+63:i] + DEST[i+63:i])
FI;
ELSE
IF *merging-masking* ; merging-masking
THEN *DEST[i+63:i] remains unchanged*
ELSE ; zeroing-masking
DEST[i+63:i] := 0
FI
FI;
ENDFOR
DEST[MAXVL-1:VL] := 0
</pre>
<h3 id="intel-c-c++-compiler-intrinsic-equivalent">Intel C/C++ Compiler Intrinsic Equivalent<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalent">
</a></h3>
<pre>VFMADDxxxPD __m512d _mm512_fmadd_pd(__m512d a, __m512d b, __m512d c);
</pre>
<pre>VFMADDxxxPD __m512d _mm512_fmadd_round_pd(__m512d a, __m512d b, __m512d c, int r);
</pre>
<pre>VFMADDxxxPD __m512d _mm512_mask_fmadd_pd(__m512d a, __mmask8 k, __m512d b, __m512d c);
</pre>
<pre>VFMADDxxxPD __m512d _mm512_maskz_fmadd_pd(__mmask8 k, __m512d a, __m512d b, __m512d c);
</pre>
<pre>VFMADDxxxPD __m512d _mm512_mask3_fmadd_pd(__m512d a, __m512d b, __m512d c, __mmask8 k);
</pre>
<pre>VFMADDxxxPD __m512d _mm512_mask_fmadd_round_pd(__m512d a, __mmask8 k, __m512d b, __m512d c, int r);
</pre>
<pre>VFMADDxxxPD __m512d _mm512_maskz_fmadd_round_pd(__mmask8 k, __m512d a, __m512d b, __m512d c, int r);
</pre>
<pre>VFMADDxxxPD __m512d _mm512_mask3_fmadd_round_pd(__m512d a, __m512d b, __m512d c, __mmask8 k, int r);
</pre>
<pre>VFMADDxxxPD __m256d _mm256_mask_fmadd_pd(__m256d a, __mmask8 k, __m256d b, __m256d c);
</pre>
<pre>VFMADDxxxPD __m256d _mm256_maskz_fmadd_pd(__mmask8 k, __m256d a, __m256d b, __m256d c);
</pre>
<pre>VFMADDxxxPD __m256d _mm256_mask3_fmadd_pd(__m256d a, __m256d b, __m256d c, __mmask8 k);
</pre>
<pre>VFMADDxxxPD __m128d _mm_mask_fmadd_pd(__m128d a, __mmask8 k, __m128d b, __m128d c);
</pre>
<pre>VFMADDxxxPD __m128d _mm_maskz_fmadd_pd(__mmask8 k, __m128d a, __m128d b, __m128d c);
</pre>
<pre>VFMADDxxxPD __m128d _mm_mask3_fmadd_pd(__m128d a, __m128d b, __m128d c, __mmask8 k);
</pre>
<pre>VFMADDxxxPD __m128d _mm_fmadd_pd (__m128d a, __m128d b, __m128d c);
</pre>
<pre>VFMADDxxxPD __m256d _mm256_fmadd_pd (__m256d a, __m256d b, __m256d c);
</pre>
<h3 class="exceptions" id="simd-floating-point-exceptions">SIMD Floating-Point Exceptions<a class="anchor" href="#simd-floating-point-exceptions">
</a></h3>
<p>Overflow, Underflow, Invalid, Precision, Denormal.</p>
<h3 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
</a></h3>
<p>VEX-encoded instructions, see <span class="not-imported">Table 2-19</span>, “Type 2 Class Exception Conditions.”</p>
<p>EVEX-encoded instructions, see <span class="not-imported">Table 2-46</span>, “Type E2 Class Exception Conditions.”</p><footer><p>
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developers Manual</a> for anything serious.
</p></footer></body></html>