ia32-64/x86/pmulhrsw.html
2025-07-08 02:23:29 -03:00

249 lines
12 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>PMULHRSW
— Packed Multiply High With Round and Scale</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>PMULHRSW
— Packed Multiply High With Round and Scale</h1>
<table>
<tr>
<th>Opcode/Instruction</th>
<th>Op/En</th>
<th>64/32 bit Mode Support</th>
<th>CPUID Feature Flag</th>
<th>Description</th></tr>
<tr>
<td>NP 0F 38 0B /r<sup>1</sup> PMULHRSW mm1, mm2/m64</td>
<td>A</td>
<td>V/V</td>
<td>SSSE3</td>
<td>Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to mm1.</td></tr>
<tr>
<td>66 0F 38 0B /r PMULHRSW xmm1, xmm2/m128</td>
<td>A</td>
<td>V/V</td>
<td>SSSE3</td>
<td>Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to xmm1.</td></tr>
<tr>
<td>VEX.128.66.0F38.WIG 0B /r VPMULHRSW xmm1, xmm2, xmm3/m128</td>
<td>B</td>
<td>V/V</td>
<td>AVX</td>
<td>Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to xmm1.</td></tr>
<tr>
<td>VEX.256.66.0F38.WIG 0B /r VPMULHRSW ymm1, ymm2, ymm3/m256</td>
<td>B</td>
<td>V/V</td>
<td>AVX2</td>
<td>Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to ymm1.</td></tr>
<tr>
<td>EVEX.128.66.0F38.WIG 0B /r VPMULHRSW xmm1 {k1}{z}, xmm2, xmm3/m128</td>
<td>C</td>
<td>V/V</td>
<td>AVX512VL AVX512BW</td>
<td>Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to xmm1 under writemask k1.</td></tr>
<tr>
<td>EVEX.256.66.0F38.WIG 0B /r VPMULHRSW ymm1 {k1}{z}, ymm2, ymm3/m256</td>
<td>C</td>
<td>V/V</td>
<td>AVX512VL AVX512BW</td>
<td>Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to ymm1 under writemask k1.</td></tr>
<tr>
<td>EVEX.512.66.0F38.WIG 0B /r VPMULHRSW zmm1 {k1}{z}, zmm2, zmm3/m512</td>
<td>C</td>
<td>V/V</td>
<td>AVX512BW</td>
<td>Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to zmm1 under writemask k1.</td></tr></table>
<blockquote>
<p>1. See note in Section 2.5, “Intel® AVX and Intel® SSE Instruction Exception Classification,” in the Intel<sup>®</sup> 64 and IA-32 Architectures Software Developers Manual, Volume 2A, and Section 23.25.3, “Exception Conditions of Legacy SIMD Instructions Operating on MMX Registers,” in the Intel<sup>®</sup> 64 and IA-32 Architectures Software Developers Manual, Volume 3B.</p></blockquote>
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
</a></h2>
<table>
<tr>
<th>Op/En</th>
<th>Tuple Type</th>
<th>Operand 1</th>
<th>Operand 2</th>
<th>Operand 3</th>
<th>Operand 4</th></tr>
<tr>
<td>A</td>
<td>N/A</td>
<td>ModRM:reg (r, w)</td>
<td>ModRM:r/m (r)</td>
<td>N/A</td>
<td>N/A</td></tr>
<tr>
<td>B</td>
<td>N/A</td>
<td>ModRM:reg (w)</td>
<td>VEX.vvvv (r)</td>
<td>ModRM:r/m (r)</td>
<td>N/A</td></tr>
<tr>
<td>C</td>
<td>Full Mem</td>
<td>ModRM:reg (w)</td>
<td>EVEX.vvvv (r)</td>
<td>ModRM:r/m (r)</td>
<td>N/A</td></tr></table>
<h2 id="description">Description<a class="anchor" href="#description">
</a></h2>
<p>PMULHRSW multiplies vertically each signed 16-bit integer from the destination operand (first operand) with the corresponding signed 16-bit integer of the source operand (second operand), producing intermediate, signed 32-bit integers. Each intermediate 32-bit integer is truncated to the 18 most significant bits. Rounding is always performed by adding 1 to the least significant bit of the 18-bit intermediate result. The final result is obtained by selecting the 16 bits immediately to the right of the most significant bit of each 18-bit intermediate result and packed to the destination operand.</p>
<p>When the source operand is a 128-bit memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.</p>
<p>In 64-bit mode and not encoded with VEX/EVEX, use the REX prefix to access XMM8-XMM15 registers.</p>
<p>Legacy SSE version 64-bit operand: Both operands can be MMX registers. The second source operand is an MMX register or a 64-bit memory location.</p>
<p>128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (MAXVL-1:128) of the corresponding YMM destination register remain unchanged.</p>
<p>VEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (MAXVL-1:128) of the destination YMM register are zeroed.</p>
<p>VEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.</p>
<p>EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with writemask k1.</p>
<h2 id="operation">Operation<a class="anchor" href="#operation">
</a></h2>
<h3 id="pmulhrsw--with-64-bit-operands-">PMULHRSW (With 64-bit Operands)<a class="anchor" href="#pmulhrsw--with-64-bit-operands-">
</a></h3>
<pre>temp0[31:0] = INT32 ((DEST[15:0] * SRC[15:0]) &gt;&gt;14) + 1;
temp1[31:0] = INT32 ((DEST[31:16] * SRC[31:16]) &gt;&gt;14) + 1;
temp2[31:0] = INT32 ((DEST[47:32] * SRC[47:32]) &gt;&gt; 14) + 1;
temp3[31:0] = INT32 ((DEST[63:48] * SRc[63:48]) &gt;&gt; 14) + 1;
DEST[15:0] = temp0[16:1];
DEST[31:16] = temp1[16:1];
DEST[47:32] = temp2[16:1];
DEST[63:48] = temp3[16:1];
</pre>
<h3 id="pmulhrsw--with-128-bit-operands-">PMULHRSW (With 128-bit Operands)<a class="anchor" href="#pmulhrsw--with-128-bit-operands-">
</a></h3>
<pre>temp0[31:0] = INT32 ((DEST[15:0] * SRC[15:0]) &gt;&gt;14) + 1;
temp1[31:0] = INT32 ((DEST[31:16] * SRC[31:16]) &gt;&gt;14) + 1;
temp2[31:0] = INT32 ((DEST[47:32] * SRC[47:32]) &gt;&gt;14) + 1;
temp3[31:0] = INT32 ((DEST[63:48] * SRC[63:48]) &gt;&gt;14) + 1;
temp4[31:0] = INT32 ((DEST[79:64] * SRC[79:64]) &gt;&gt;14) + 1;
temp5[31:0] = INT32 ((DEST[95:80] * SRC[95:80]) &gt;&gt;14) + 1;
temp6[31:0] = INT32 ((DEST[111:96] * SRC[111:96]) &gt;&gt;14) + 1;
temp7[31:0] = INT32 ((DEST[127:112] * SRC[127:112) &gt;&gt;14) + 1;
DEST[15:0] = temp0[16:1];
DEST[31:16] = temp1[16:1];
DEST[47:32] = temp2[16:1];
DEST[63:48] = temp3[16:1];
DEST[79:64] = temp4[16:1];
DEST[95:80] = temp5[16:1];
DEST[111:96] = temp6[16:1];
DEST[127:112] = temp7[16:1];
</pre>
<h3 id="vpmulhrsw--vex-128-encoded-version-">VPMULHRSW (VEX.128 Encoded Version)<a class="anchor" href="#vpmulhrsw--vex-128-encoded-version-">
</a></h3>
<pre>temp0[31:0] := INT32 ((SRC1[15:0] * SRC2[15:0]) &gt;&gt;14) + 1
temp1[31:0] := INT32 ((SRC1[31:16] * SRC2[31:16]) &gt;&gt;14) + 1
temp2[31:0] := INT32 ((SRC1[47:32] * SRC2[47:32]) &gt;&gt;14) + 1
temp3[31:0] := INT32 ((SRC1[63:48] * SRC2[63:48]) &gt;&gt;14) + 1
temp4[31:0] := INT32 ((SRC1[79:64] * SRC2[79:64]) &gt;&gt;14) + 1
temp5[31:0] := INT32 ((SRC1[95:80] * SRC2[95:80]) &gt;&gt;14) + 1
temp6[31:0] := INT32 ((SRC1[111:96] * SRC2[111:96]) &gt;&gt;14) + 1
temp7[31:0] := INT32 ((SRC1[127:112] * SRC2[127:112) &gt;&gt;14) + 1
DEST[15:0] := temp0[16:1]
DEST[31:16] := temp1[16:1]
DEST[47:32] := temp2[16:1]
DEST[63:48] := temp3[16:1]
DEST[79:64] := temp4[16:1]
DEST[95:80] := temp5[16:1]
DEST[111:96] := temp6[16:1]
DEST[127:112] := temp7[16:1]
DEST[MAXVL-1:128] := 0
</pre>
<h3 id="vpmulhrsw--vex-256-encoded-version-">VPMULHRSW (VEX.256 Encoded Version)<a class="anchor" href="#vpmulhrsw--vex-256-encoded-version-">
</a></h3>
<pre>temp0[31:0] := INT32 ((SRC1[15:0] * SRC2[15:0]) &gt;&gt;14) + 1
temp1[31:0] := INT32 ((SRC1[31:16] * SRC2[31:16]) &gt;&gt;14) + 1
temp2[31:0] := INT32 ((SRC1[47:32] * SRC2[47:32]) &gt;&gt;14) + 1
temp3[31:0] := INT32 ((SRC1[63:48] * SRC2[63:48]) &gt;&gt;14) + 1
temp4[31:0] := INT32 ((SRC1[79:64] * SRC2[79:64]) &gt;&gt;14) + 1
temp5[31:0] := INT32 ((SRC1[95:80] * SRC2[95:80]) &gt;&gt;14) + 1
temp6[31:0] := INT32 ((SRC1[111:96] * SRC2[111:96]) &gt;&gt;14) + 1
temp7[31:0] := INT32 ((SRC1[127:112] * SRC2[127:112) &gt;&gt;14) + 1
temp8[31:0] := INT32 ((SRC1[143:128] * SRC2[143:128]) &gt;&gt;14) + 1
temp9[31:0] := INT32 ((SRC1[159:144] * SRC2[159:144]) &gt;&gt;14) + 1
temp10[31:0] := INT32 ((SRC1[75:160] * SRC2[175:160]) &gt;&gt;14) + 1
temp11[31:0] := INT32 ((SRC1[191:176] * SRC2[191:176]) &gt;&gt;14) + 1
temp12[31:0] := INT32 ((SRC1[207:192] * SRC2[207:192]) &gt;&gt;14) + 1
temp13[31:0] := INT32 ((SRC1[223:208] * SRC2[223:208]) &gt;&gt;14) + 1
temp14[31:0] := INT32 ((SRC1[239:224] * SRC2[239:224]) &gt;&gt;14) + 1
temp15[31:0] := INT32 ((SRC1[255:240] * SRC2[255:240) &gt;&gt;14) + 1
DEST[15:0] := temp0[16:1]
DEST[31:16] := temp1[16:1]
DEST[47:32] := temp2[16:1]
DEST[63:48] := temp3[16:1]
DEST[79:64] := temp4[16:1]
DEST[95:80] := temp5[16:1]
DEST[111:96] := temp6[16:1]
DEST[127:112] := temp7[16:1]
DEST[143:128] := temp8[16:1]
DEST[159:144] := temp9[16:1]
DEST[175:160] := temp10[16:1]
DEST[191:176] := temp11[16:1]
DEST[207:192] := temp12[16:1]
DEST[223:208] := temp13[16:1]
DEST[239:224] := temp14[16:1]
DEST[255:240] := temp15[16:1]
DEST[MAXVL-1:256] := 0
</pre>
<h3 id="vpmulhrsw--evex-encoded-version-">VPMULHRSW (EVEX Encoded Version)<a class="anchor" href="#vpmulhrsw--evex-encoded-version-">
</a></h3>
<pre>(KL, VL) = (8, 128), (16, 256), (32, 512)
FOR j := 0 TO KL-1
i := j * 16
IF k1[j] OR *no writemask*
THEN
temp[31:0] := ((SRC1[i+15:i] * SRC2[i+15:i]) &gt;&gt;14) + 1
DEST[i+15:i] := tmp[16:1]
ELSE
IF *merging-masking* ; merging-masking
THEN *DEST[i+15:i] remains unchanged*
ELSE *zeroing-masking*
; zeroing-masking
DEST[i+15:i] := 0
FI
FI;
ENDFOR
DEST[MAXVL-1:VL] := 0
</pre>
<h2 id="intel-c-c++-compiler-intrinsic-equivalents">Intel C/C++ Compiler Intrinsic Equivalents<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalents">
</a></h2>
<pre>VPMULHRSW __m512i _mm512_mulhrs_epi16(__m512i a, __m512i b);
</pre>
<pre>VPMULHRSW __m512i _mm512_mask_mulhrs_epi16(__m512i s, __mmask32 k, __m512i a, __m512i b);
</pre>
<pre>VPMULHRSW __m512i _mm512_maskz_mulhrs_epi16( __mmask32 k, __m512i a, __m512i b);
</pre>
<pre>VPMULHRSW __m256i _mm256_mask_mulhrs_epi16(__m256i s, __mmask16 k, __m256i a, __m256i b);
</pre>
<pre>VPMULHRSW __m256i _mm256_maskz_mulhrs_epi16( __mmask16 k, __m256i a, __m256i b);
</pre>
<pre>VPMULHRSW __m128i _mm_mask_mulhrs_epi16(__m128i s, __mmask8 k, __m128i a, __m128i b);
</pre>
<pre>VPMULHRSW __m128i _mm_maskz_mulhrs_epi16( __mmask8 k, __m128i a, __m128i b);
</pre>
<pre>PMULHRSW __m64 _mm_mulhrs_pi16 (__m64 a, __m64 b)
</pre>
<pre>(V)PMULHRSW __m128i _mm_mulhrs_epi16 (__m128i a, __m128i b)
</pre>
<pre>VPMULHRSW __m256i _mm256_mulhrs_epi16 (__m256i a, __m256i b)
</pre>
<h2 class="exceptions" id="simd-floating-point-exceptions">SIMD Floating-Point Exceptions<a class="anchor" href="#simd-floating-point-exceptions">
</a></h2>
<p>None.</p>
<h2 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
</a></h2>
<p>Non-EVEX-encoded instruction, see <span class="not-imported">Table 2-21</span>, “Type 4 Class Exception Conditions.”</p>
<p>EVEX-encoded instruction, see Exceptions Type E4.nb in <span class="not-imported">Table 2-49</span>, “Type E4 Class Exception Conditions.”</p><footer><p>
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developers Manual</a> for anything serious.
</p></footer></body></html>