forked from NRZCode/ia32-64
327 lines
14 KiB
HTML
327 lines
14 KiB
HTML
<!DOCTYPE html>
|
||
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>VPSLLVW/VPSLLVD/VPSLLVQ
|
||
— Variable Bit Shift Left Logical</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>VPSLLVW/VPSLLVD/VPSLLVQ
|
||
— Variable Bit Shift Left Logical</h1>
|
||
|
||
|
||
|
||
<table>
|
||
<tr>
|
||
<th>Opcode/Instruction</th>
|
||
<th>Op / En</th>
|
||
<th>64/32 bit Mode Support</th>
|
||
<th>CPUID Feature Flag</th>
|
||
<th>Description</th></tr>
|
||
<tr>
|
||
<td>VEX.128.66.0F38.W0 47 /r VPSLLVD xmm1, xmm2, xmm3/m128</td>
|
||
<td>A</td>
|
||
<td>V/V</td>
|
||
<td>AVX2</td>
|
||
<td>Shift doublewords in xmm2 left by amount specified in the corresponding element of xmm3/m128 while shifting in 0s.</td></tr>
|
||
<tr>
|
||
<td>VEX.128.66.0F38.W1 47 /r VPSLLVQ xmm1, xmm2, xmm3/m128</td>
|
||
<td>A</td>
|
||
<td>V/V</td>
|
||
<td>AVX2</td>
|
||
<td>Shift quadwords in xmm2 left by amount specified in the corresponding element of xmm3/m128 while shifting in 0s.</td></tr>
|
||
<tr>
|
||
<td>VEX.256.66.0F38.W0 47 /r VPSLLVD ymm1, ymm2, ymm3/m256</td>
|
||
<td>A</td>
|
||
<td>V/V</td>
|
||
<td>AVX2</td>
|
||
<td>Shift doublewords in ymm2 left by amount specified in the corresponding element of ymm3/m256 while shifting in 0s.</td></tr>
|
||
<tr>
|
||
<td>VEX.256.66.0F38.W1 47 /r VPSLLVQ ymm1, ymm2, ymm3/m256</td>
|
||
<td>A</td>
|
||
<td>V/V</td>
|
||
<td>AVX2</td>
|
||
<td>Shift quadwords in ymm2 left by amount specified in the corresponding element of ymm3/m256 while shifting in 0s.</td></tr>
|
||
<tr>
|
||
<td>EVEX.128.66.0F38.W1 12 /r VPSLLVW xmm1 {k1}{z}, xmm2, xmm3/m128</td>
|
||
<td>B</td>
|
||
<td>V/V</td>
|
||
<td>AVX512VL AVX512BW</td>
|
||
<td>Shift words in xmm2 left by amount specified in the corresponding element of xmm3/m128 while shifting in 0s using writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.256.66.0F38.W1 12 /r VPSLLVW ymm1 {k1}{z}, ymm2, ymm3/m256</td>
|
||
<td>B</td>
|
||
<td>V/V</td>
|
||
<td>AVX512VL AVX512BW</td>
|
||
<td>Shift words in ymm2 left by amount specified in the corresponding element of ymm3/m256 while shifting in 0s using writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.512.66.0F38.W1 12 /r VPSLLVW zmm1 {k1}{z}, zmm2, zmm3/m512</td>
|
||
<td>B</td>
|
||
<td>V/V</td>
|
||
<td>AVX512BW</td>
|
||
<td>Shift words in zmm2 left by amount specified in the corresponding element of zmm3/m512 while shifting in 0s using writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.128.66.0F38.W0 47 /r VPSLLVD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst</td>
|
||
<td>C</td>
|
||
<td>V/V</td>
|
||
<td>AVX512VL AVX512F</td>
|
||
<td>Shift doublewords in xmm2 left by amount specified in the corresponding element of xmm3/m128/m32bcst while shifting in 0s using writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.256.66.0F38.W0 47 /r VPSLLVD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst</td>
|
||
<td>C</td>
|
||
<td>V/V</td>
|
||
<td>AVX512VL AVX512F</td>
|
||
<td>Shift doublewords in ymm2 left by amount specified in the corresponding element of ymm3/m256/m32bcst while shifting in 0s using writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.512.66.0F38.W0 47 /r VPSLLVD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst</td>
|
||
<td>C</td>
|
||
<td>V/V</td>
|
||
<td>AVX512F</td>
|
||
<td>Shift doublewords in zmm2 left by amount specified in the corresponding element of zmm3/m512/m32bcst while shifting in 0s using writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.128.66.0F38.W1 47 /r VPSLLVQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst</td>
|
||
<td>C</td>
|
||
<td>V/V</td>
|
||
<td>AVX512VL AVX512F</td>
|
||
<td>Shift quadwords in xmm2 left by amount specified in the corresponding element of xmm3/m128/m64bcst while shifting in 0s using writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.256.66.0F38.W1 47 /r VPSLLVQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst</td>
|
||
<td>C</td>
|
||
<td>V/V</td>
|
||
<td>AVX512VL AVX512F</td>
|
||
<td>Shift quadwords in ymm2 left by amount specified in the corresponding element of ymm3/m256/m64bcst while shifting in 0s using writemask k1.</td></tr>
|
||
<tr>
|
||
<td>EVEX.512.66.0F38.W1 47 /r VPSLLVQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst</td>
|
||
<td>C</td>
|
||
<td>V/V</td>
|
||
<td>AVX512F</td>
|
||
<td>Shift quadwords in zmm2 left by amount specified in the corresponding element of zmm3/m512/m64bcst while shifting in 0s using writemask k1.</td></tr></table>
|
||
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
|
||
¶
|
||
</a></h2>
|
||
<table>
|
||
<tr>
|
||
<th>Op/En</th>
|
||
<th>Tuple Type</th>
|
||
<th>Operand 1</th>
|
||
<th>Operand 2</th>
|
||
<th>Operand 3</th>
|
||
<th>Operand 4</th></tr>
|
||
<tr>
|
||
<td>A</td>
|
||
<td>N/A</td>
|
||
<td>ModRM:reg (w)</td>
|
||
<td>VEX.vvvv (r)</td>
|
||
<td>ModRM:r/m (r)</td>
|
||
<td>N/A</td></tr>
|
||
<tr>
|
||
<td>B</td>
|
||
<td>Full Mem</td>
|
||
<td>ModRM:reg (w)</td>
|
||
<td>EVEX.vvvv (r)</td>
|
||
<td>ModRM:r/m (r)</td>
|
||
<td>N/A</td></tr>
|
||
<tr>
|
||
<td>C</td>
|
||
<td>Full</td>
|
||
<td>ModRM:reg (w)</td>
|
||
<td>EVEX.vvvv (r)</td>
|
||
<td>ModRM:r/m (r)</td>
|
||
<td>N/A</td></tr></table>
|
||
<h3 id="description">Description<a class="anchor" href="#description">
|
||
¶
|
||
</a></h3>
|
||
<p>Shifts the bits in the individual data elements (words, doublewords or quadword) in the first source operand to the left by the count value of respective data elements in the second source operand. As the bits in the data elements are shifted left, the empty low-order bits are cleared (set to 0).</p>
|
||
<p>The count values are specified individually in each data element of the second source operand. If the unsigned integer value specified in the respective data element of the second source operand is greater than 15 (for word), 31 (for doublewords), or 63 (for a quadword), then the destination data element are written with 0.</p>
|
||
<p>VEX.128 encoded version: The destination and first source operands are XMM registers. The count operand can be either an XMM register or a 128-bit memory location. Bits (MAXVL-1:128) of the corresponding destination register are zeroed.</p>
|
||
<p>VEX.256 encoded version: The destination and first source operands are YMM registers. The count operand can be either an YMM register or a 256-bit memory. Bits (MAXVL-1:256) of the corresponding ZMM register are zeroed.</p>
|
||
<p>EVEX encoded VPSLLVD/Q: The destination and first source operands are ZMM/YMM/XMM registers. The count operand can be either a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512-bit vector broadcasted from a 32/64-bit memory location. The destination is conditionally updated with writemask k1.</p>
|
||
<p>EVEX encoded VPSLLVW: The destination and first source operands are ZMM/YMM/XMM registers. The count operand can be either a ZMM/YMM/XMM register, a 512/256/128-bit memory location. The destination is conditionally updated with writemask k1.</p>
|
||
<h3 id="operation">Operation<a class="anchor" href="#operation">
|
||
¶
|
||
</a></h3>
|
||
<h4 id="vpsllvw--evex-encoded-version-">VPSLLVW (EVEX encoded version)<a class="anchor" href="#vpsllvw--evex-encoded-version-">
|
||
¶
|
||
</a></h4>
|
||
<pre>(KL, VL) = (8, 128), (16, 256), (32, 512)
|
||
FOR j := 0 TO KL-1
|
||
i := j * 16
|
||
IF k1[j] OR *no writemask*
|
||
THEN DEST[i+15:i] := ZeroExtend(SRC1[i+15:i] << SRC2[i+15:i])
|
||
ELSE
|
||
IF *merging-masking* ; merging-masking
|
||
THEN *DEST[i+15:i] remains unchanged*
|
||
ELSE
|
||
; zeroing-masking
|
||
DEST[i+15:i] := 0
|
||
FI
|
||
FI;
|
||
ENDFOR;
|
||
DEST[MAXVL-1:VL] := 0;
|
||
</pre>
|
||
<h4 id="vpsllvd--vex-128-version-">VPSLLVD (VEX.128 version)<a class="anchor" href="#vpsllvd--vex-128-version-">
|
||
¶
|
||
</a></h4>
|
||
<pre>COUNT_0 := SRC2[31 : 0]
|
||
(* Repeat Each COUNT_i for the 2nd through 4th dwords of SRC2*)
|
||
COUNT_3 := SRC2[127 : 96];
|
||
IF COUNT_0 < 32 THEN
|
||
DEST[31:0] := ZeroExtend(SRC1[31:0] << COUNT_0);
|
||
ELSE
|
||
DEST[31:0] := 0;
|
||
(* Repeat shift operation for 2nd through 4th dwords *)
|
||
IF COUNT_3 < 32 THEN
|
||
DEST[127:96] := ZeroExtend(SRC1[127:96] << COUNT_3);
|
||
ELSE
|
||
DEST[127:96] := 0;
|
||
DEST[MAXVL-1:128] := 0;
|
||
</pre>
|
||
<h4 id="vpsllvd--vex-256-version-">VPSLLVD (VEX.256 version)<a class="anchor" href="#vpsllvd--vex-256-version-">
|
||
¶
|
||
</a></h4>
|
||
<pre>COUNT_0 := SRC2[31 : 0];
|
||
(* Repeat Each COUNT_i for the 2nd through 7th dwords of SRC2*)
|
||
COUNT_7 := SRC2[255 : 224];
|
||
IF COUNT_0 < 32 THEN
|
||
DEST[31:0] := ZeroExtend(SRC1[31:0] << COUNT_0);
|
||
ELSE
|
||
DEST[31:0] := 0;
|
||
(* Repeat shift operation for 2nd through 7th dwords *)
|
||
IF COUNT_7 < 32 THEN
|
||
DEST[255:224] := ZeroExtend(SRC1[255:224] << COUNT_7);
|
||
ELSE
|
||
DEST[255:224] := 0;
|
||
DEST[MAXVL-1:256] := 0;
|
||
</pre>
|
||
<h4 id="vpsllvd--evex-encoded-version-">VPSLLVD (EVEX encoded version)<a class="anchor" href="#vpsllvd--evex-encoded-version-">
|
||
¶
|
||
</a></h4>
|
||
<pre>(KL, VL) = (4, 128), (8, 256), (16, 512)
|
||
FOR j := 0 TO KL-1
|
||
i := j * 32
|
||
IF k1[j] OR *no writemask* THEN
|
||
IF (EVEX.b = 1) AND (SRC2 *is memory*)
|
||
THEN DEST[i+31:i] := ZeroExtend(SRC1[i+31:i] << SRC2[31:0])
|
||
ELSE DEST[i+31:i] := ZeroExtend(SRC1[i+31:i] << SRC2[i+31:i])
|
||
FI;
|
||
ELSE
|
||
IF *merging-masking* ; merging-masking
|
||
THEN *DEST[i+31:i] remains unchanged*
|
||
ELSE ; zeroing-masking
|
||
DEST[i+31:i] := 0
|
||
FI
|
||
FI;
|
||
ENDFOR;
|
||
DEST[MAXVL-1:VL] := 0;
|
||
</pre>
|
||
<h4 id="vpsllvq--vex-128-version-">VPSLLVQ (VEX.128 version)<a class="anchor" href="#vpsllvq--vex-128-version-">
|
||
¶
|
||
</a></h4>
|
||
<pre>COUNT_0 := SRC2[63 : 0];
|
||
COUNT_1 := SRC2[127 : 64];
|
||
IF COUNT_0 < 64THEN
|
||
DEST[63:0] := ZeroExtend(SRC1[63:0] << COUNT_0);
|
||
ELSE
|
||
DEST[63:0] := 0;
|
||
IF COUNT_1 < 64 THEN
|
||
DEST[127:64] := ZeroExtend(SRC1[127:64] << COUNT_1);
|
||
ELSE
|
||
DEST[127:96] := 0;
|
||
DEST[MAXVL-1:128] := 0;
|
||
</pre>
|
||
<h4 id="vpsllvq--vex-256-version-">VPSLLVQ (VEX.256 version)<a class="anchor" href="#vpsllvq--vex-256-version-">
|
||
¶
|
||
</a></h4>
|
||
<pre>COUNT_0 := SRC2[63 : 0];
|
||
(* Repeat Each COUNT_i for the 2nd through 4th dwords of SRC2*)
|
||
COUNT_3 := SRC2[255 : 192];
|
||
IF COUNT_0 < 64THEN
|
||
DEST[63:0] := ZeroExtend(SRC1[63:0] << COUNT_0);
|
||
ELSE
|
||
DEST[63:0] := 0;
|
||
(* Repeat shift operation for 2nd through 4th dwords *)
|
||
IF COUNT_3 < 64 THEN
|
||
DEST[255:192] := ZeroExtend(SRC1[255:192] << COUNT_3);
|
||
ELSE
|
||
DEST[255:192] := 0;
|
||
DEST[MAXVL-1:256] := 0;
|
||
</pre>
|
||
<h4 id="vpsllvq--evex-encoded-version-">VPSLLVQ (EVEX encoded version)<a class="anchor" href="#vpsllvq--evex-encoded-version-">
|
||
¶
|
||
</a></h4>
|
||
<pre>(KL, VL) = (2, 128), (4, 256), (8, 512)
|
||
FOR j := 0 TO KL-1
|
||
i := j * 64
|
||
IF k1[j] OR *no writemask* THEN
|
||
IF (EVEX.b = 1) AND (SRC2 *is memory*)
|
||
THEN DEST[i+63:i] := ZeroExtend(SRC1[i+63:i] << SRC2[63:0])
|
||
ELSE DEST[i+63:i] := ZeroExtend(SRC1[i+63:i] << SRC2[i+63:i])
|
||
FI;
|
||
ELSE
|
||
IF *merging-masking* ; merging-masking
|
||
THEN *DEST[i+63:i] remains unchanged*
|
||
ELSE
|
||
; zeroing-masking
|
||
DEST[i+63:i] := 0
|
||
FI
|
||
FI;
|
||
ENDFOR;
|
||
DEST[MAXVL-1:VL] := 0;
|
||
</pre>
|
||
<h3 id="intel-c-c++-compiler-intrinsic-equivalent">Intel C/C++ Compiler Intrinsic Equivalent<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalent">
|
||
¶
|
||
</a></h3>
|
||
<pre>VPSLLVW __m512i _mm512_sllv_epi16(__m512i a, __m512i cnt);
|
||
</pre>
|
||
<pre>VPSLLVW __m512i _mm512_mask_sllv_epi16(__m512i s, __mmask32 k, __m512i a, __m512i cnt);
|
||
</pre>
|
||
<pre>VPSLLVW __m512i _mm512_maskz_sllv_epi16( __mmask32 k, __m512i a, __m512i cnt);
|
||
</pre>
|
||
<pre>VPSLLVW __m256i _mm256_mask_sllv_epi16(__m256i s, __mmask16 k, __m256i a, __m256i cnt);
|
||
</pre>
|
||
<pre>VPSLLVW __m256i _mm256_maskz_sllv_epi16( __mmask16 k, __m256i a, __m256i cnt);
|
||
</pre>
|
||
<pre>VPSLLVW __m128i _mm_mask_sllv_epi16(__m128i s, __mmask8 k, __m128i a, __m128i cnt);
|
||
</pre>
|
||
<pre>VPSLLVW __m128i _mm_maskz_sllv_epi16( __mmask8 k, __m128i a, __m128i cnt);
|
||
</pre>
|
||
<pre>VPSLLVD __m512i _mm512_sllv_epi32(__m512i a, __m512i cnt);
|
||
</pre>
|
||
<pre>VPSLLVD __m512i _mm512_mask_sllv_epi32(__m512i s, __mmask16 k, __m512i a, __m512i cnt);
|
||
</pre>
|
||
<pre>VPSLLVD __m512i _mm512_maskz_sllv_epi32( __mmask16 k, __m512i a, __m512i cnt);
|
||
</pre>
|
||
<pre>VPSLLVD __m256i _mm256_mask_sllv_epi32(__m256i s, __mmask8 k, __m256i a, __m256i cnt);
|
||
</pre>
|
||
<pre>VPSLLVD __m256i _mm256_maskz_sllv_epi32( __mmask8 k, __m256i a, __m256i cnt);
|
||
</pre>
|
||
<pre>VPSLLVD __m128i _mm_mask_sllv_epi32(__m128i s, __mmask8 k, __m128i a, __m128i cnt);
|
||
</pre>
|
||
<pre>VPSLLVD __m128i _mm_maskz_sllv_epi32( __mmask8 k, __m128i a, __m128i cnt);
|
||
</pre>
|
||
<pre>VPSLLVQ __m512i _mm512_sllv_epi64(__m512i a, __m512i cnt);
|
||
</pre>
|
||
<pre>VPSLLVQ __m512i _mm512_mask_sllv_epi64(__m512i s, __mmask8 k, __m512i a, __m512i cnt);
|
||
</pre>
|
||
<pre>VPSLLVQ __m512i _mm512_maskz_sllv_epi64( __mmask8 k, __m512i a, __m512i cnt);
|
||
</pre>
|
||
<pre>VPSLLVD __m256i _mm256_mask_sllv_epi64(__m256i s, __mmask8 k, __m256i a, __m256i cnt);
|
||
</pre>
|
||
<pre>VPSLLVD __m256i _mm256_maskz_sllv_epi64( __mmask8 k, __m256i a, __m256i cnt);
|
||
</pre>
|
||
<pre>VPSLLVD __m128i _mm_mask_sllv_epi64(__m128i s, __mmask8 k, __m128i a, __m128i cnt);
|
||
</pre>
|
||
<pre>VPSLLVD __m128i _mm_maskz_sllv_epi64( __mmask8 k, __m128i a, __m128i cnt);
|
||
</pre>
|
||
<pre>VPSLLVD __m256i _mm256_sllv_epi32 (__m256i m, __m256i count)
|
||
</pre>
|
||
<pre>VPSLLVQ __m256i _mm256_sllv_epi64 (__m256i m, __m256i count)
|
||
</pre>
|
||
<h3 class="exceptions" id="simd-floating-point-exceptions">SIMD Floating-Point Exceptions<a class="anchor" href="#simd-floating-point-exceptions">
|
||
¶
|
||
</a></h3>
|
||
<p>None.</p>
|
||
<h3 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
|
||
¶
|
||
</a></h3>
|
||
<p>VEX-encoded instructions, see <span class="not-imported">Table 2-21</span>, “Type 4 Class Exception Conditions.”</p>
|
||
<p>EVEX-encoded VPSLLVD/VPSLLVQ, see <span class="not-imported">Table 2-49</span>, “Type E4 Class Exception Conditions.”</p>
|
||
<p>EVEX-encoded VPSLLVW, see Exceptions Type E4.nb in <span class="not-imported">Table 2-49</span>, “Type E4 Class Exception Conditions.”</p><footer><p>
|
||
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
|
||
inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
|
||
ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developer’s Manual</a> for anything serious.
|
||
</p></footer></body></html>
|