ia32-64/x86/vreducesh.html

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>VREDUCESH
		— Perform Reduction Transformation on Scalar FP16 Value</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>VREDUCESH
		— Perform Reduction Transformation on Scalar FP16 Value</h1>

<table>
<tr>
<th> Instruction En bit Mode Flag
Support  Instruction En bit Mode Flag
Support  64/32 CPUID Feature Instruction En bit Mode Flag  CPUID Feature Instruction En bit Mode Flag  Op/ 64/32 CPUID Feature Instruction En bit Mode Flag  64/32 CPUID Feature Instruction En bit Mode Flag  CPUID Feature Instruction En bit Mode Flag  Op/ 64/32 CPUID Feature </th>
<th></th>
<th>Support</th>
<th></th>
<th>Description</th></tr>
<tr>
<td>EVEX.LLIG.NP.0F3A.W0 57 /r /ib VREDUCESH xmm1{k1}{z}, xmm2, xmm3/m16 {sae}, imm8</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16</td>
<td>Perform a reduction transformation on the low binary encoded FP16 value in xmm3/m16 by subtracting a number of fraction bits specified by the imm8 field. Store the result in xmm1 subject to writemask k1. Bits 127:16 from xmm2 are copied to xmm1[127:16].</td></tr></table>
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
			¶
		</a></h2>
<table>
<tr>
<th>Op/En</th>
<th>Tuple</th>
<th>Operand 1</th>
<th>Operand 2</th>
<th>Operand 3</th>
<th>Operand 4</th></tr>
<tr>
<td>A</td>
<td>Scalar</td>
<td>ModRM:reg (w)</td>
<td>VEX.vvvv (r)</td>
<td>ModRM:r/m (r)</td>
<td>imm8 (r)</td></tr></table>
<h3 id="description">Description<a class="anchor" href="#description">
			¶
		</a></h3>
<p>This instruction performs a reduction transformation of the low binary encoded FP16 value in the source operand (the second operand) and store the reduced result in binary FP format to the low element of the destination operand (the first operand) under the writemask k1. For further details see the description of VREDUCEPH.</p>
<p>Bits 127:16 of the destination operand are copied from the corresponding bits of the first source operand. Bits MAXVL-1:128 of the destination operand are zeroed. The low FP16 element of the destination is updated according to the writemask.</p>
<p>This instruction might end up with a precision exception set. However, in case of SPE set (i.e., Suppress Precision Exception, which is imm8[3]=1), no precision exception is reported.</p>
<p>This instruction may generate tiny non-zero result. If it does so, it does not report underflow exception, even if underflow exceptions are unmasked (UM flag in MXCSR register is 0).</p>
<p>For special cases, see <a href='vreduceph.html#tbl-5-30'>Table 5-30</a>.</p>
<h3 id="operation">Operation<a class="anchor" href="#operation">
			¶
		</a></h3>
<h4 id="vreducesh-dest-k1---src--imm8">VREDUCESH dest{k1}, src, imm8<a class="anchor" href="#vreducesh-dest-k1---src--imm8">
			¶
		</a></h4>
<pre>IF k1[0] or *no writemask*:
    dest.fp16[0] := reduce_fp16(src2.fp16[0], imm8)
        // see VREDUCEPH
ELSE IF *zeroing*:
    dest.fp16[0] := 0
//else dest.fp16[0] remains unchanged
DEST[127:16] := src1[127:16]
DEST[MAXVL-1:128] := 0
</pre>
<h3 id="intel-c-c++-compiler-intrinsic-equivalent">Intel C/C++ Compiler Intrinsic Equivalent<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalent">
			¶
		</a></h3>
<pre>VREDUCESH __m128h _mm_mask_reduce_round_sh (__m128h src, __mmask8 k, __m128h a, __m128h b, int imm8, const int sae);
</pre>
<pre>VREDUCESH __m128h _mm_maskz_reduce_round_sh (__mmask8 k, __m128h a, __m128h b, int imm8, const int sae);
</pre>
<pre>VREDUCESH __m128h _mm_reduce_round_sh (__m128h a, __m128h b, int imm8, const int sae);
</pre>
<pre>VREDUCESH __m128h _mm_mask_reduce_sh (__m128h src, __mmask8 k, __m128h a, __m128h b, int imm8);
</pre>
<pre>VREDUCESH __m128h _mm_maskz_reduce_sh (__mmask8 k, __m128h a, __m128h b, int imm8);
</pre>
<pre>VREDUCESH __m128h _mm_reduce_sh (__m128h a, __m128h b, int imm8);
</pre>
<h3 class="exceptions" id="simd-floating-point-exceptions">SIMD Floating-Point Exceptions<a class="anchor" href="#simd-floating-point-exceptions">
			¶
		</a></h3>
<p>Invalid, Precision.</p>
<h3 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
			¶
		</a></h3>
<p>EVEX-encoded instructions, see <span class="not-imported">Table 2-47</span>, “Type E3 Class Exception Conditions.”</p><footer><p>
		This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
		inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
		ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developer’s Manual</a> for anything serious.
	</p></footer></body></html>