ia32-64/x86/dppd.html

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>DPPD
		— Dot Product of Packed Double Precision Floating-Point Values</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>DPPD
		— Dot Product of Packed Double Precision Floating-Point Values</h1>

<table>
<tr>
<th>Opcode/Instruction</th>
<th>Op/En</th>
<th>64/32-bit Mode</th>
<th>CPUID Feature Flag</th>
<th>Description</th></tr>
<tr>
<td>66 0F 3A 41 /r ib DPPD xmm1, xmm2/m128, imm8</td>
<td>RMI</td>
<td>V/V</td>
<td>SSE4_1</td>
<td>Selectively multiply packed double precision floating-point values from xmm1 with packed double precision floating-point values from xmm2, add and selectively store the packed double precision floating-point values to xmm1.</td></tr>
<tr>
<td>VEX.128.66.0F3A.WIG 41 /r ib VDPPD xmm1,xmm2, xmm3/m128, imm8</td>
<td>RVMI</td>
<td>V/V</td>
<td>AVX</td>
<td>Selectively multiply packed double precision floating-point values from xmm2 with packed double precision floating-point values from xmm3, add and selectively store the packed double precision floating-point values to xmm1.</td></tr></table>
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
			¶
		</a></h2>
<table>
<tr>
<th>Op/En</th>
<th>Operand 1</th>
<th>Operand 2</th>
<th>Operand 3</th>
<th>Operand 4</th></tr>
<tr>
<td>RMI</td>
<td>ModRM:reg (r, w)</td>
<td>ModRM:r/m (r)</td>
<td>imm8</td>
<td>N/A</td></tr>
<tr>
<td>RVMI</td>
<td>ModRM:reg (w)</td>
<td>VEX.vvvv (r)</td>
<td>ModRM:r/m (r)</td>
<td>imm8</td></tr></table>
<h2 id="description">Description<a class="anchor" href="#description">
			¶
		</a></h2>
<p>Conditionally multiplies the packed double precision floating-point values in the destination operand (first operand) with the packed double precision floating-point values in the source (second operand) depending on a mask extracted from bits [5:4] of the immediate operand (third operand). If a condition mask bit is zero, the corresponding multiplication is replaced by a value of 0.0 in the manner described by Section 12.8.4 of Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.</p>
<p>The two resulting double precision values are summed into an intermediate result. The intermediate result is conditionally broadcasted to the destination using a broadcast mask specified by bits [1:0] of the immediate byte.</p>
<p>If a broadcast mask bit is “1”, the intermediate result is copied to the corresponding qword element in the destination operand. If a broadcast mask bit is zero, the corresponding element in the destination is set to zero.</p>
<p>DPPD follows the NaN forwarding rules stated in the Software Developer’s Manual, vol. 1, <span class="not-imported">table 4-7</span>. These rules do not cover horizontal prioritization of NaNs. Horizontal propagation of NaNs to the destination and the positioning of those NaNs in the destination is implementation dependent. NaNs on the input sources or computationally generated NaNs will have at least one NaN propagated to the destination.</p>
<p>128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (MAXVL-1:128) of the corresponding YMM register destination are unmodified.</p>
<p>VEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAXVL-1:128) of the corresponding YMM register destination are zeroed.</p>
<p>If VDPPD is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.</p>
<h2 id="operation">Operation<a class="anchor" href="#operation">
			¶
		</a></h2>
<h3 id="dp_primitive--src1--src2-">DP_primitive (SRC1, SRC2)<a class="anchor" href="#dp_primitive--src1--src2-">
			¶
		</a></h3>
<pre>IF (imm8[4] = 1)
    THEN Temp1[63:0] := DEST[63:0] * SRC[63:0]; // update SIMD exception flags
    ELSE Temp1[63:0] := +0.0; FI;
IF (imm8[5] = 1)
    THEN Temp1[127:64] := DEST[127:64] * SRC[127:64]; // update SIMD exception flags
    ELSE Temp1[127:64] := +0.0; FI;
/* if unmasked exception reported, execute exception handler*/
Temp2[63:0] := Temp1[63:0] + Temp1[127:64]; // update SIMD exception flags
/* if unmasked exception reported, execute exception handler*/
IF (imm8[0] = 1)
    THEN DEST[63:0] := Temp2[63:0];
    ELSE DEST[63:0] := +0.0; FI;
IF (imm8[1] = 1)
    THEN DEST[127:64] := Temp2[63:0];
    ELSE DEST[127:64] := +0.0; FI;
</pre>
<h3 id="dppd--128-bit-legacy-sse-version-">DPPD (128-bit Legacy SSE Version)<a class="anchor" href="#dppd--128-bit-legacy-sse-version-">
			¶
		</a></h3>
<pre>DEST[127:0] := DP_Primitive(SRC1[127:0], SRC2[127:0]);
DEST[MAXVL-1:128] (Unmodified)
</pre>
<h3 id="vdppd--vex-128-encoded-version-">VDPPD (VEX.128 Encoded Version)<a class="anchor" href="#vdppd--vex-128-encoded-version-">
			¶
		</a></h3>
<pre>DEST[127:0] := DP_Primitive(SRC1[127:0], SRC2[127:0]);
DEST[MAXVL-1:128] := 0
</pre>
<h2 id="flags-affected">Flags Affected<a class="anchor" href="#flags-affected">
			¶
		</a></h2>
<p>None.</p>
<h2 id="intel-c-c++-compiler-intrinsic-equivalent">Intel C/C++ Compiler Intrinsic Equivalent<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalent">
			¶
		</a></h2>
<pre>DPPD __m128d _mm_dp_pd ( __m128d a, __m128d b, const int mask);
</pre>
<h2 class="exceptions" id="simd-floating-point-exceptions">SIMD Floating-Point Exceptions<a class="anchor" href="#simd-floating-point-exceptions">
			¶
		</a></h2>
<p>Overflow, Underflow, Invalid, Precision, Denormal.</p>
<p>Exceptions are determined separately for each add and multiply operation. Unmasked exceptions will leave the destination untouched.</p>
<h2 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
			¶
		</a></h2>
<p>See <span class="not-imported">Table 2-19</span>, “Type 2 Class Exception Conditions,” additionally:</p>
<table>
<tr>
<td>#UD</td>
<td>If VEX.L= 1.</td></tr></table><footer><p>
		This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
		inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
		ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developer’s Manual</a> for anything serious.
	</p></footer></body></html>