ia32-64/x86/vreduceph.html
2025-07-08 02:23:29 -03:00

175 lines
8.1 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>VREDUCEPH
— Perform Reduction Transformation on Packed FP16 Values</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>VREDUCEPH
— Perform Reduction Transformation on Packed FP16 Values</h1>
<table>
<tr>
<th> Instruction En bit Mode Flag
Support Instruction En bit Mode Flag
Support 64/32 CPUID Feature Instruction En bit Mode Flag CPUID Feature Instruction En bit Mode Flag Op/ 64/32 CPUID Feature Instruction En bit Mode Flag 64/32 CPUID Feature Instruction En bit Mode Flag CPUID Feature Instruction En bit Mode Flag Op/ 64/32 CPUID Feature </th>
<th></th>
<th>Support</th>
<th></th>
<th>Description</th></tr>
<tr>
<td>EVEX.128.NP.0F3A.W0 56 /r /ib VREDUCEPH xmm1{k1}{z}, xmm2/m128/m16bcst, imm8</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16 AVX512VL</td>
<td>Perform reduction transformation on packed FP16 values in xmm2/m128/m16bcst by subtracting a number of fraction bits specified by the imm8 field. Store the result in xmm1 subject to writemask k1.</td></tr>
<tr>
<td>EVEX.256.NP.0F3A.W0 56 /r /ib VREDUCEPH ymm1{k1}{z}, ymm2/m256/m16bcst, imm8</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16 AVX512VL</td>
<td>Perform reduction transformation on packed FP16 values in ymm2/m256/m16bcst by subtracting a number of fraction bits specified by the imm8 field. Store the result in ymm1 subject to writemask k1.</td></tr>
<tr>
<td>EVEX.512.NP.0F3A.W0 56 /r /ib VREDUCEPH zmm1{k1}{z}, zmm2/m512/m16bcst {sae}, imm8</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16</td>
<td>Perform reduction transformation on packed FP16 values in zmm2/m512/m16bcst by subtracting a number of fraction bits specified by the imm8 field. Store the result in zmm1 subject to writemask k1.</td></tr></table>
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
</a></h2>
<table>
<tr>
<th>Op/En</th>
<th>Tuple</th>
<th>Operand 1</th>
<th>Operand 2</th>
<th>Operand 3</th>
<th>Operand 4</th></tr>
<tr>
<td>A</td>
<td>Full</td>
<td>ModRM:reg (w)</td>
<td>ModRM:r/m (r)</td>
<td>imm8 (r)</td>
<td>N/A</td></tr></table>
<h3 id="description">Description<a class="anchor" href="#description">
</a></h3>
<p>This instruction performs a reduction transformation of the packed binary encoded FP16 values in the source operand (the second operand) and store the reduced results in binary FP format to the destination operand (the first operand) under the writemask k1.</p>
<p>The reduction transformation subtracts the integer part and the leading M fractional bits from the binary FP source value, where M is a unsigned integer specified by imm8[7:4]. Specifically, the reduction transformation can be expressed as:</p>
<p>dest = src (ROUND(2<sup>M</sup> * src)) * 2<sup>M</sup></p>
<p>where ROUND() treats src, 2<sup>M</sup>, and their product as binary FP numbers with normalized significand and biased exponents.</p>
<p>The magnitude of the reduced result can be expressed by considering src = 2<sup>p</sup> * man2, where man2 is the normalized significand and p is the unbiased exponent.</p>
<p>Then if RC=RNE: 0 ≤ |ReducedResult| ≤ 2<sup>M1</sup>.</p>
<p>Then if RC ≠ RNE: 0 ≤ |ReducedResult| &lt; 2<sup>M</sup>.</p>
<p>This instruction might end up with a precision exception set. However, in case of SPE set (i.e., Suppress Precision Exception, which is imm8[3]=1), no precision exception is reported.</p>
<p>This instruction may generate tiny non-zero result. If it does so, it does not report underflow exception, even if underflow exceptions are unmasked (UM flag in MXCSR register is 0).</p>
<p>For special cases, see <a href='vreduceph.html#tbl-5-30'>Table 5-30</a>.</p>
<figure id="tbl-5-30">
<table>
<tr>
<th>Input value</th>
<th>Round Mode</th>
<th>Returned Value</th></tr>
<tr>
<td>|Src1| &lt; 2<em><sup></sup></em><sup>M</sup><em><sup></sup></em><sup>1</sup></td>
<td>RNE</td>
<td>Src1</td></tr>
<tr>
<td rowspan="4">|Src1| &lt; 2<em><sup>M</sup></em></td>
<td>RU, Src1 &gt; 0</td>
<td>Round(Src1 2<em><sup></sup></em><sup>M</sup>)<sup>1</sup></td></tr>
<tr>
<td>RU, Src1 ≤ 0</td>
<td>Src1</td></tr>
<tr>
<td>RD, Src1 ≥ 0</td>
<td>Src1</td></tr>
<tr>
<td>RD, Src1 &lt; 0</td>
<td>Round(Src1 + 2<em><sup></sup></em><sup>M</sup>)</td></tr>
<tr>
<td rowspan="2">Src1 = ±0 or Dest = ±0 (Src1 ≠ ∞)</td>
<td>NOT RD</td>
<td>+0.0</td></tr>
<tr>
<td>RD</td>
<td>0.0</td></tr>
<tr>
<td>Src1 = ±∞</td>
<td>Any</td>
<td>+0.0</td></tr>
<tr>
<td>Src1 = ±NAN</td>
<td>Any</td>
<td>QNaN (Src1)</td></tr></table>
<figcaption><a href='vreduceph.html#tbl-5-30'>Table 5-30</a>. VREDUCEPH/VREDUCESH Special Cases</figcaption></figure>
<blockquote>
<p>1. The Round(.) function uses rounding controls specified by (imm8[2]? MXCSR.RC: imm8[1:0]).</p></blockquote>
<h3 id="operation">Operation<a class="anchor" href="#operation">
</a></h3>
<pre>def reduce_fp16(src, imm8):
nan := (src.exp = 0x1F) and (src.fraction != 0)
if nan:
return QNAN(src)
m := imm8[7:4]
rc := imm8[1:0]
rc_source := imm8[2]
spe := imm[3] // suppress precision exception
tmp := 2^(-m) * ROUND(2^m * src, spe, rc_source, rc)
tmp := src - tmp // using same RC, SPE controls
return tmp
</pre>
<h4 id="vreduceph-dest-k1---src--imm8">VREDUCEPH dest{k1}, src, imm8<a class="anchor" href="#vreduceph-dest-k1---src--imm8">
</a></h4>
<pre>VL = 128, 256 or 512
KL := VL/16
FOR i := 0 to KL-1:
IF k1[i] or *no writemask*:
IF SRC is memory and (EVEX.b = 1):
tsrc := src.fp16[0]
ELSE:
tsrc := src.fp16[i]
DEST.fp16[i] := reduce_fp16(tsrc, imm8)
ELSE IF *zeroing*:
DEST.fp16[i] := 0
//else DEST.fp16[i] remains unchanged
DEST[MAXVL-1:VL] := 0
</pre>
<h3 id="intel-c-c++-compiler-intrinsic-equivalent">Intel C/C++ Compiler Intrinsic Equivalent<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalent">
</a></h3>
<pre>VREDUCEPH __m128h _mm_mask_reduce_ph (__m128h src, __mmask8 k, __m128h a, int imm8);
</pre>
<pre>VREDUCEPH __m128h _mm_maskz_reduce_ph (__mmask8 k, __m128h a, int imm8);
</pre>
<pre>VREDUCEPH __m128h _mm_reduce_ph (__m128h a, int imm8);
</pre>
<pre>VREDUCEPH __m256h _mm256_mask_reduce_ph (__m256h src, __mmask16 k, __m256h a, int imm8);
</pre>
<pre>VREDUCEPH __m256h _mm256_maskz_reduce_ph (__mmask16 k, __m256h a, int imm8);
</pre>
<pre>VREDUCEPH __m256h _mm256_reduce_ph (__m256h a, int imm8);
</pre>
<pre>VREDUCEPH __m512h _mm512_mask_reduce_ph (__m512h src, __mmask32 k, __m512h a, int imm8);
</pre>
<pre>VREDUCEPH __m512h _mm512_maskz_reduce_ph (__mmask32 k, __m512h a, int imm8);
</pre>
<pre>VREDUCEPH __m512h _mm512_reduce_ph (__m512h a, int imm8);
</pre>
<pre>VREDUCEPH __m512h _mm512_mask_reduce_round_ph (__m512h src, __mmask32 k, __m512h a, int imm8, const int sae);
</pre>
<pre>VREDUCEPH __m512h _mm512_maskz_reduce_round_ph (__mmask32 k, __m512h a, int imm8, const int sae);
</pre>
<pre>VREDUCEPH __m512h _mm512_reduce_round_ph (__m512h a, int imm8, const int sae);
</pre>
<h3 class="exceptions" id="simd-floating-point-exceptions">SIMD Floating-Point Exceptions<a class="anchor" href="#simd-floating-point-exceptions">
</a></h3>
<p>Invalid, Precision.</p>
<h3 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
</a></h3>
<p>EVEX-encoded instruction, see <span class="not-imported">Table 2-46</span>, “Type E2 Class Exception Conditions.”</p><footer><p>
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developers Manual</a> for anything serious.
</p></footer></body></html>