ia32-64/x86/v4fmaddss.v4fnmaddss.html
2025-07-08 02:23:29 -03:00

112 lines
5.6 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>V4FMADDSS/V4FNMADDSS
— Scalar Single Precision Floating-Point Fused Multiply-Add(4-Iterations)</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>V4FMADDSS/V4FNMADDSS
— Scalar Single Precision Floating-Point Fused Multiply-Add(4-Iterations)</h1>
<table>
<tr>
<th>Opcode/Instruction</th>
<th>Op/En</th>
<th>64/32 bit Mode Support</th>
<th>CPUID Feature Flag</th>
<th>Description</th></tr>
<tr>
<td>EVEX.LLIG.F2.0F38.W0 9B /r V4FMADDSS xmm1{k1}{z}, xmm2+3, m128</td>
<td>A</td>
<td>V/V</td>
<td>AVX512_4FMAPS</td>
<td>Multiply scalar single-precision floating-point values from source register block indicated by xmm2 by values from m128 and accumulate the result in xmm1.</td></tr>
<tr>
<td>EVEX.LLIG.F2.0F38.W0 AB /r V4FNMADDSS xmm1{k1}{z}, xmm2+3, m128</td>
<td>A</td>
<td>V/V</td>
<td>AVX512_4FMAPS</td>
<td>Multiply and negate scalar single-precision floating-point values from source register block indicated by xmm2 by values from m128 and accumulate the result in xmm1.</td></tr></table>
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
</a></h2>
<table>
<tr>
<th>Op/En Tuple Operand 1 Operand 2 Operand 3 Operand 4</th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th></tr>
<tr>
<td>A Tuple1_4X ModRM:reg (r, w) EVEX.vvvv (r) ModRM:r/m (r) N/A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td></tr></table>
<h3 id="description">Description<a class="anchor" href="#description">
</a></h3>
<p>This instruction computes 4 sequential scalar fused single-precision floating-point multiply-add instructions with a sequentially selected memory operand in each of the four steps.</p>
<p>In the above box, the notation of “+3” is used to denote that the instruction accesses 4 source registers based that operand; sources are consecutive, start in a multiple of 4 boundary, and contain the encoded register operand.</p>
<p>This instruction supports memory fault suppression. The entire memory operand is loaded if the least significant mask bit is set to 1 or if a “no masking” encoding is used.</p>
<p>The tuple type Tuple1_4X implies that four 32-bit elements (16 bytes) are referenced by the memory operation portion of this instruction.</p>
<p>Rounding is performed at every FMA boundary. Exceptions are also taken sequentially. Pre- and post-computational exceptions of the first FMA take priority over the pre- and post-computational exceptions of the second FMA, etc.</p>
<h3 id="operation">Operation<a class="anchor" href="#operation">
</a></h3>
<pre>src_reg_id is the 5 bit index of the vector register specified in the instruction as the src1 register.
define NFMA_SS(vl, dest, k1, msrc, regs_loaded, src_base, posneg):
tmpdest := dest
// reg[] is an array representing the SIMD register file.
IF k1[0] or *no writemask*:
FOR j := 0 to regs_loaded - 1:
IF posneg = 0:
tmpdest.single[0] := RoundFPControl_MXCSR(tmpdest.single[0] - reg[src_base + j ].single[0] * msrc.single[j])
ELSE:
tmpdest.single[0] := RoundFPControl_MXCSR(tmpdest.single[0] + reg[src_base + j ].single[0] * msrc.single[j])
ELSE IF *zeroing*:
tmpdest.single[0] := 0
dest := tmpdst
dest[MAX_VL-1:VL] := 0
</pre>
<h4 id="v4fmaddss-and-v4fnmaddss-dest-k1---src1--msrc--avx512-">V4FMADDSS and V4FNMADDSS dest{k1}, src1, msrc (AVX512)<a class="anchor" href="#v4fmaddss-and-v4fnmaddss-dest-k1---src1--msrc--avx512-">
</a></h4>
<pre>VL = 128
regs_loaded := 4
src_base := src_reg_id &amp; ~3 // for src1 operand
posneg := 0 if negative form, 1 otherwise
NFMA_SS(vl, dest, k1, msrc, regs_loaded, src_base, posneg)
</pre>
<h3 id="intel-c-c++-compiler-intrinsic-equivalent">Intel C/C++ Compiler Intrinsic Equivalent<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalent">
</a></h3>
<pre>V4FMADDSS __m128 _mm_4fmadd_ss(__m128, __m128x4, __m128 *);
</pre>
<pre>V4FMADDSS __m128 _mm_mask_4fmadd_ss(__m128, __mmask8, __m128x4, __m128 *);
</pre>
<pre>V4FMADDSS __m128 _mm_maskz_4fmadd_ss(__mmask8, __m128, __m128x4, __m128 *);
</pre>
<pre>V4FNMADDSS __m128 _mm_4fnmadd_ss(__m128, __m128x4, __m128 *);
</pre>
<pre>V4FNMADDSS __m128 _mm_mask_4fnmadd_ss(__m128, __mmask8, __m128x4, __m128 *);
</pre>
<pre>V4FNMADDSS __m128 _mm_maskz_4fnmadd_ss(__mmask8, __m128, __m128x4, __m128 *);
</pre>
<h3 class="exceptions" id="simd-floating-point-exceptions">SIMD Floating-Point Exceptions<a class="anchor" href="#simd-floating-point-exceptions">
</a></h3>
<p>Overflow, Underflow, Invalid, Precision, Denormal.</p>
<h3 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
</a></h3>
<p>See Type E2; additionally:</p>
<table>
<tr>
<td>#UD</td>
<td>If the EVEX broadcast bit is set to 1.</td></tr>
<tr>
<td>#UD</td>
<td>If the MODRM.mod = 0b11.</td></tr></table><footer><p>
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developers Manual</a> for anything serious.
</p></footer></body></html>