ia32-64/x86/phaddsw.html
2025-07-08 02:23:29 -03:00

150 lines
8 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>PHADDSW
— Packed Horizontal Add and Saturate</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>PHADDSW
— Packed Horizontal Add and Saturate</h1>
<table>
<tr>
<th>Opcode/Instruction</th>
<th>Op/En</th>
<th>64/32 bit Mode Support</th>
<th>CPUID Feature Flag</th>
<th>Description</th></tr>
<tr>
<td>NP 0F 38 03 /r<sup>1</sup> PHADDSW mm1, mm2/m64</td>
<td>RM</td>
<td>V/V</td>
<td>SSSE3</td>
<td>Add 16-bit signed integers horizontally, pack saturated integers to mm1.</td></tr>
<tr>
<td>66 0F 38 03 /r PHADDSW xmm1, xmm2/m128</td>
<td>RM</td>
<td>V/V</td>
<td>SSSE3</td>
<td>Add 16-bit signed integers horizontally, pack saturated integers to xmm1.</td></tr>
<tr>
<td>VEX.128.66.0F38.WIG 03 /r VPHADDSW xmm1, xmm2, xmm3/m128</td>
<td>RVM</td>
<td>V/V</td>
<td>AVX</td>
<td>Add 16-bit signed integers horizontally, pack saturated integers to xmm1.</td></tr>
<tr>
<td>VEX.256.66.0F38.WIG 03 /r VPHADDSW ymm1, ymm2, ymm3/m256</td>
<td>RVM</td>
<td>V/V</td>
<td>AVX2</td>
<td>Add 16-bit signed integers horizontally, pack saturated integers to ymm1.</td></tr></table>
<blockquote>
<p>1. See note in Section 2.5, “Intel® AVX and Intel® SSE Instruction Exception Classification,” in the Intel<sup>®</sup> 64 and IA-32 Architectures Software Developers Manual, Volume 2A, and Section 23.25.3, “Exception Conditions of Legacy SIMD Instructions Operating on MMX Registers,” in the Intel<sup>®</sup> 64 and IA-32 Architectures Software Developers Manual, Volume 3B.</p></blockquote>
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
</a></h2>
<table>
<tr>
<th>Op/En</th>
<th>Operand 1</th>
<th>Operand 2</th>
<th>Operand 3</th>
<th>Operand 4</th></tr>
<tr>
<td>RM</td>
<td>ModRM:reg (r, w)</td>
<td>ModRM:r/m (r)</td>
<td>N/A</td>
<td>N/A</td></tr>
<tr>
<td>RVM</td>
<td>ModRM:reg (w)</td>
<td>VEX.vvvv (r)</td>
<td>ModRM:r/m (r)</td>
<td>N/A</td></tr></table>
<h2 id="description">Description<a class="anchor" href="#description">
</a></h2>
<p>(V)PHADDSW adds two adjacent signed 16-bit integers horizontally from the source and destination operands and saturates the signed results; packs the signed, saturated 16-bit results to the destination operand (first operand) When the source operand is a 128-bit memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.</p>
<p>Legacy SSE version: Both operands can be MMX registers. The second source operand can be an MMX register or a 64-bit memory location.</p>
<p>128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (MAXVL-1:128) of the corresponding YMM destination register remain unchanged.</p>
<p>In 64-bit mode, use the REX prefix to access additional registers.</p>
<p>VEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (MAXVL-1:128) of the destination YMM register are zeroed.</p>
<p>VEX.256 encoded version: The first source and destination operands are YMM registers. The second source operand can be an YMM register or a 256-bit memory location.</p>
<h2 id="operation">Operation<a class="anchor" href="#operation">
</a></h2>
<h3 id="phaddsw--with-64-bit-operands-">PHADDSW (With 64-bit Operands)<a class="anchor" href="#phaddsw--with-64-bit-operands-">
</a></h3>
<pre>mm1[15-0] = SaturateToSignedWord((mm1[31-16] + mm1[15-0]);
mm1[31-16] = SaturateToSignedWord(mm1[63-48] + mm1[47-32]);
mm1[47-32] = SaturateToSignedWord(mm2/m64[31-16] + mm2/m64[15-0]);
mm1[63-48] = SaturateToSignedWord(mm2/m64[63-48] + mm2/m64[47-32]);
</pre>
<h3 id="phaddsw--with-128-bit-operands-">PHADDSW (With 128-bit Operands)<a class="anchor" href="#phaddsw--with-128-bit-operands-">
</a></h3>
<pre>xmm1[15-0]= SaturateToSignedWord(xmm1[31-16] + xmm1[15-0]);
xmm1[31-16] = SaturateToSignedWord(xmm1[63-48] + xmm1[47-32]);
xmm1[47-32] = SaturateToSignedWord(xmm1[95-80] + xmm1[79-64]);
xmm1[63-48] = SaturateToSignedWord(xmm1[127-112] + xmm1[111-96]);
xmm1[79-64] = SaturateToSignedWord(xmm2/m128[31-16] + xmm2/m128[15-0]);
xmm1[95-80] = SaturateToSignedWord(xmm2/m128[63-48] + xmm2/m128[47-32]);
xmm1[111-96] = SaturateToSignedWord(xmm2/m128[95-80] + xmm2/m128[79-64]);
xmm1[127-112] = SaturateToSignedWord(xmm2/m128[127-112] + xmm2/m128[111-96]);
</pre>
<h3 id="vphaddsw--vex-128-encoded-version-">VPHADDSW (VEX.128 Encoded Version)<a class="anchor" href="#vphaddsw--vex-128-encoded-version-">
</a></h3>
<pre>DEST[15:0]= SaturateToSignedWord(SRC1[31:16] + SRC1[15:0])
DEST[31:16] = SaturateToSignedWord(SRC1[63:48] + SRC1[47:32])
DEST[47:32] = SaturateToSignedWord(SRC1[95:80] + SRC1[79:64])
DEST[63:48] = SaturateToSignedWord(SRC1[127:112] + SRC1[111:96])
DEST[79:64] = SaturateToSignedWord(SRC2[31:16] + SRC2[15:0])
DEST[95:80] = SaturateToSignedWord(SRC2[63:48] + SRC2[47:32])
DEST[111:96] = SaturateToSignedWord(SRC2[95:80] + SRC2[79:64])
DEST[127:112] = SaturateToSignedWord(SRC2[127:112] + SRC2[111:96])
DEST[MAXVL-1:128] := 0
</pre>
<h3 id="vphaddsw--vex-256-encoded-version-">VPHADDSW (VEX.256 Encoded Version)<a class="anchor" href="#vphaddsw--vex-256-encoded-version-">
</a></h3>
<pre>DEST[15:0]= SaturateToSignedWord(SRC1[31:16] + SRC1[15:0])
DEST[31:16] = SaturateToSignedWord(SRC1[63:48] + SRC1[47:32])
DEST[47:32] = SaturateToSignedWord(SRC1[95:80] + SRC1[79:64])
DEST[63:48] = SaturateToSignedWord(SRC1[127:112] + SRC1[111:96])
DEST[79:64] = SaturateToSignedWord(SRC2[31:16] + SRC2[15:0])
DEST[95:80] = SaturateToSignedWord(SRC2[63:48] + SRC2[47:32])
DEST[111:96] = SaturateToSignedWord(SRC2[95:80] + SRC2[79:64])
DEST[127:112] = SaturateToSignedWord(SRC2[127:112] + SRC2[111:96])
DEST[143:128]= SaturateToSignedWord(SRC1[159:144] + SRC1[143:128])
DEST[159:144] = SaturateToSignedWord(SRC1[191:176] + SRC1[175:160])
DEST[175:160] = SaturateToSignedWord( SRC1[223:208] + SRC1[207:192])
DEST[191:176] = SaturateToSignedWord(SRC1[255:240] + SRC1[239:224])
DEST[207:192] = SaturateToSignedWord(SRC2[127:112] + SRC2[143:128])
DEST[223:208] = SaturateToSignedWord(SRC2[159:144] + SRC2[175:160])
DEST[239:224] = SaturateToSignedWord(SRC2[191-160] + SRC2[159-128])
DEST[255:240] = SaturateToSignedWord(SRC2[255:240] + SRC2[239:224])
</pre>
<h2 id="intel-c-c++-compiler-intrinsic-equivalent">Intel C/C++ Compiler Intrinsic Equivalent<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalent">
</a></h2>
<pre>PHADDSW __m64 _mm_hadds_pi16 (__m64 a, __m64 b)
</pre>
<pre>(V)PHADDSW __m128i _mm_hadds_epi16 (__m128i a, __m128i b)
</pre>
<pre>VPHADDSW __m256i _mm256_hadds_epi16 (__m256i a, __m256i b)
</pre>
<h2 class="exceptions" id="simd-floating-point-exceptions">SIMD Floating-Point Exceptions<a class="anchor" href="#simd-floating-point-exceptions">
</a></h2>
<p>None.</p>
<h2 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
</a></h2>
<p>See <span class="not-imported">Table 2-21</span>, “Type 4 Class Exception Conditions,” additionally:</p>
<table>
<tr>
<td>#UD</td>
<td>If VEX.L = 1.</td></tr></table><footer><p>
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developers Manual</a> for anything serious.
</p></footer></body></html>