ia32-64/x86/vp4dpwssds.html
2025-07-08 02:23:29 -03:00

96 lines
4.6 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>VP4DPWSSDS
— Dot Product of Signed Words With Dword Accumulation and Saturation(4-Iterations)</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>VP4DPWSSDS
— Dot Product of Signed Words With Dword Accumulation and Saturation(4-Iterations)</h1>
<table>
<tr>
<th>Opcode/Instruction</th>
<th>Op/En</th>
<th>64/32 bit Mode Support</th>
<th>CPUID Feature Flag</th>
<th>Description</th></tr>
<tr>
<td>EVEX.512.F2.0F38.W0 53 /r VP4DPWSSDS zmm1{k1}{z}, zmm2+3, m128</td>
<td>A</td>
<td>V/V</td>
<td>AVX512_4VNNIW</td>
<td>Multiply signed words from source register block indicated by zmm2 by signed words from m128 and accumulate the resulting dword results with signed saturation in zmm1.</td></tr></table>
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
</a></h2>
<table>
<tr>
<th>Op/En Tuple Operand 1 Operand 2 Operand 3 Operand 4</th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th></tr>
<tr>
<td>A Tuple1_4X ModRM:reg (r, w) EVEX.vvvv (r) ModRM:r/m (r) N/A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td></tr></table>
<h3 id="description">Description<a class="anchor" href="#description">
</a></h3>
<p>This instruction computes 4 sequential register source-block dot-products of two signed word operands with doubleword accumulation and signed saturation. The memory operand is sequentially selected in each of the four steps.</p>
<p>In the above box, the notation of “+3” is used to denote that the instruction accesses 4 source registers based on that operand; sources are consecutive, start in a multiple of 4 boundary, and contain the encoded register operand.</p>
<p>This instruction supports memory fault suppression. The entire memory operand is loaded if any bit of the lowest 16-bits of the mask is set to 1 or if a “no masking” encoding is used.</p>
<p>The tuple type Tuple1_4X implies that four 32-bit elements (16 bytes) are referenced by the memory operation portion of this instruction.</p>
<h3 id="operation">Operation<a class="anchor" href="#operation">
</a></h3>
<pre>src_reg_id is the 5 bit index of the vector register specified in the instruction as the src1 register.
</pre>
<h4 id="vp4dpwssds-dest--src1--src2">VP4DPWSSDS dest, src1, src2<a class="anchor" href="#vp4dpwssds-dest--src1--src2">
</a></h4>
<pre>(KL,VL) = (16,512)
N := 4
ORIGDEST := DEST
src_base := src_reg_id &amp; ~ (N-1) // for src1 operand
FOR i := 0 to KL-1:
IF k1[i] or *no writemask*:
FOR m := 0 to N-1:
t := SRC2.dword[m]
p1dword := reg[src_base+m].word[2*i] * t.word[0]
p2dword := reg[src_base+m].word[2*i+1] * t.word[1]
DEST.dword[i] := SIGNED_DWORD_SATURATE(DEST.dword[i] + p1dword + p2dword)
ELSE IF *zeroing*:
DEST.dword[i] := 0
ELSE
DEST.dword[i] := ORIGDEST.dword[i]
DEST[MAX_VL-1:VL] := 0
</pre>
<h3 id="intel-c-c++-compiler-intrinsic-equivalent">Intel C/C++ Compiler Intrinsic Equivalent<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalent">
</a></h3>
<pre>VP4DPWSSDS __m512i _mm512_4dpwssds_epi32(__m512i, __m512ix4, __m128i *);
</pre>
<pre>VP4DPWSSDS __m512i _mm512_mask_4dpwssds_epi32(__m512i, __mmask16, __m512ix4, __m128i *);
</pre>
<pre>VP4DPWSSDS __m512i _mm512_maskz_4dpwssds_epi32(__mmask16, __m512i, __m512ix4, __m128i *);
</pre>
<h3 class="exceptions" id="simd-floating-point-exceptions">SIMD Floating-Point Exceptions<a class="anchor" href="#simd-floating-point-exceptions">
</a></h3>
<p>None.</p>
<h3 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
</a></h3>
<p>See Type E4; additionally:</p>
<table>
<tr>
<td>#UD</td>
<td>If the EVEX broadcast bit is set to 1.</td></tr>
<tr>
<td>#UD</td>
<td>If the MODRM.mod = 0b11.</td></tr></table><footer><p>
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developers Manual</a> for anything serious.
</p></footer></body></html>