ia32-64/x86/vrsqrt14ps.html
2025-07-08 02:23:29 -03:00

153 lines
7.9 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>VRSQRT14PS
— Compute Approximate Reciprocals of Square Roots of Packed Float32 Values</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>VRSQRT14PS
— Compute Approximate Reciprocals of Square Roots of Packed Float32 Values</h1>
<table>
<tr>
<th>Opcode/Instruction</th>
<th>Op/En</th>
<th>64/32 bit Mode Support</th>
<th>CPUID Feature Flag</th>
<th>Description</th></tr>
<tr>
<td>EVEX.128.66.0F38.W0 4E /r VRSQRT14PS xmm1 {k1}{z}, xmm2/m128/m32bcst</td>
<td>A</td>
<td>V/V</td>
<td>AVX512VL AVX512F</td>
<td>Computes the approximate reciprocal square roots of the packed single-precision floating-point values in xmm2/m128/m32bcst and stores the results in xmm1. Under writemask.</td></tr>
<tr>
<td>EVEX.256.66.0F38.W0 4E /r VRSQRT14PS ymm1 {k1}{z}, ymm2/m256/m32bcst</td>
<td>A</td>
<td>V/V</td>
<td>AVX512VL AVX512F</td>
<td>Computes the approximate reciprocal square roots of the packed single-precision floating-point values in ymm2/m256/m32bcst and stores the results in ymm1. Under writemask.</td></tr>
<tr>
<td>EVEX.512.66.0F38.W0 4E /r VRSQRT14PS zmm1 {k1}{z}, zmm2/m512/m32bcst</td>
<td>A</td>
<td>V/V</td>
<td>AVX512F</td>
<td>Computes the approximate reciprocal square roots of the packed single-precision floating-point values in zmm2/m512/m32bcst and stores the results in zmm1. Under writemask.</td></tr></table>
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
</a></h2>
<table>
<tr>
<th>Op/En</th>
<th>Tuple Type</th>
<th>Operand 1</th>
<th>Operand 2</th>
<th>Operand 3</th>
<th>Operand 4</th></tr>
<tr>
<td>A</td>
<td>Full</td>
<td>ModRM:reg (w)</td>
<td>ModRM:r/m (r)</td>
<td>N/A</td>
<td>N/A</td></tr></table>
<h3 id="description">Description<a class="anchor" href="#description">
</a></h3>
<p>This instruction performs a SIMD computation of the approximate reciprocals of the square roots of 16 packed single-precision floating-point values in the source operand (the second operand) and stores the packed single-precision floating-point results in the destination operand (the first operand) according to the writemask. The maximum relative error for this approximation is less than 2<sup>-14</sup>.</p>
<p>EVEX.512 encoded version: The source operand can be a ZMM register, a 512-bit memory location or a 512-bit vector broadcasted from a 32-bit memory location. The destination operand is a ZMM register, conditionally updated using writemask k1.</p>
<p>EVEX.256 encoded version: The source operand is a YMM register, a 256-bit memory location, or a 256-bit vector broadcasted from a 32-bit memory location. The destination operand is a YMM register, conditionally updated using writemask k1.</p>
<p>EVEX.128 encoded version: The source operand is a XMM register, a 128-bit memory location, or a 128-bit vector broadcasted from a 32-bit memory location. The destination operand is a XMM register, conditionally updated using writemask k1.</p>
<p>The VRSQRT14PS instruction is not affected by the rounding control bits in the MXCSR register. When a source value is a 0.0, an ∞ with the sign of the source value is returned. When the source operand is an +∞ then +ZERO value is returned. A denormal source value is treated as zero only if DAZ bit is set in MXCSR. Otherwise it is treated correctly and performs the approximation with the specified masked response. When a source value is a negative value (other than 0.0) a floating-point QNaN_indefinite is returned. When a source value is an SNaN or QNaN, the SNaN is converted to a QNaN or the source QNaN is returned.</p>
<p>MXCSR exception flags are not affected by this instruction and floating-point exceptions are not reported.</p>
<p>Note: EVEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.</p>
<h3 id="a-numerically-exact-implementation-of-vrsqrt14xx-can-be-found-at-https---software-intel-com-en-us-arti-">A numerically exact implementation of VRSQRT14xx can be found at https://software.intel.com/en-us/arti-<a class="anchor" href="#a-numerically-exact-implementation-of-vrsqrt14xx-can-be-found-at-https---software-intel-com-en-us-arti-">
</a></h3>
<h3 id="cles-reference-implementations-for-ia-approximation-instructions-vrcp14-vrsqrt14-vrcp28-vrsqrt28-vexp2-">cles/reference-implementations-for-IA-approximation-instructions-vrcp14-vrsqrt14-vrcp28-vrsqrt28-vexp2.<a class="anchor" href="#cles-reference-implementations-for-ia-approximation-instructions-vrcp14-vrsqrt14-vrcp28-vrsqrt28-vexp2-">
</a></h3>
<h3 id="operation">Operation<a class="anchor" href="#operation">
</a></h3>
<h4 id="vrsqrt14ps--evex-encoded-versions-">VRSQRT14PS (EVEX encoded versions)<a class="anchor" href="#vrsqrt14ps--evex-encoded-versions-">
</a></h4>
<pre>(KL, VL) = (4, 128), (8, 256), (16, 512)
FOR j := 0 TO KL-1
i := j * 32
IF k1[j] OR *no writemask* THEN
IF (EVEX.b = 1) AND (SRC *is memory*)
THEN DEST[i+31:i] := APPROXIMATE(1.0/ SQRT(SRC[31:0]));
ELSE DEST[i+31:i] := APPROXIMATE(1.0/ SQRT(SRC[i+31:i]));
FI;
ELSE
IF *merging-masking* ; merging-masking
THEN *DEST[i+31:i] remains unchanged*
ELSE
; zeroing-masking
DEST[i+31:i] := 0
FI;
FI;
ENDFOR;
DEST[MAXVL-1:VL] := 0
</pre>
<figure id="tbl-5-36">
<table>
<tr>
<th>Input value</th>
<th>Result value</th>
<th>Comments</th></tr>
<tr>
<td>Any denormal</td>
<td>Normal</td>
<td>Cannot generate overflow</td></tr>
<tr>
<td>X = 2<sup>-2n</sup></td>
<td><sub>2</sub><sup>n</sup></td>
<td></td></tr>
<tr>
<td>X&lt;0</td>
<td>QNaN_Indefinite</td>
<td>Including -INF</td></tr>
<tr>
<td>X = -0</td>
<td>-INF</td>
<td></td></tr>
<tr>
<td>X = +0</td>
<td>+INF</td>
<td></td></tr>
<tr>
<td>X = +INF</td>
<td>+0</td>
<td></td></tr></table>
<figcaption><a href='vrsqrt14ps.html#tbl-5-36'>Table 5-36</a>. VRSQRT14PS Special Cases</figcaption></figure>
<h3 id="intel-c-c++-compiler-intrinsic-equivalent">Intel C/C++ Compiler Intrinsic Equivalent<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalent">
</a></h3>
<pre>VRSQRT14PS __m512 _mm512_rsqrt14_ps( __m512 a);
</pre>
<pre>VRSQRT14PS __m512 _mm512_mask_rsqrt14_ps(__m512 s, __mmask16 k, __m512 a);
</pre>
<pre>VRSQRT14PS __m512 _mm512_maskz_rsqrt14_ps( __mmask16 k, __m512 a);
</pre>
<pre>VRSQRT14PS __m256 _mm256_rsqrt14_ps( __m256 a);
</pre>
<pre>VRSQRT14PS __m256 _mm256_mask_rsqrt14_ps(__m256 s, __mmask8 k, __m256 a);
</pre>
<pre>VRSQRT14PS __m256 _mm256_maskz_rsqrt14_ps( __mmask8 k, __m256 a);
</pre>
<pre>VRSQRT14PS __m128 _mm_rsqrt14_ps( __m128 a);
</pre>
<pre>VRSQRT14PS __m128 _mm_mask_rsqrt14_ps(__m128 s, __mmask8 k, __m128 a);
</pre>
<pre>VRSQRT14PS __m128 _mm_maskz_rsqrt14_ps( __mmask8 k, __m128 a);
</pre>
<h3 class="exceptions" id="simd-floating-point-exceptions">SIMD Floating-Point Exceptions<a class="anchor" href="#simd-floating-point-exceptions">
</a></h3>
<p>None.</p>
<h3 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
</a></h3>
<p>See <span class="not-imported">Table 2-21</span>, “Type 4 Class Exception Conditions.”</p><footer><p>
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developers Manual</a> for anything serious.
</p></footer></body></html>