ia32-64/x86/vfcmaddcph.vfmaddcph.html
2025-07-08 02:23:29 -03:00

216 lines
10 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>VFCMADDCPH/VFMADDCPH
— Complex Multiply and Accumulate FP16 Values</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>VFCMADDCPH/VFMADDCPH
— Complex Multiply and Accumulate FP16 Values</h1>
<table>
<tr>
<th> Instruction En Bit Mode Flag
Support Instruction En Bit Mode Flag
Support 64/32 CPUID Feature Instruction En Bit Mode Flag CPUID Feature Instruction En Bit Mode Flag Op/ 64/32 CPUID Feature Instruction En Bit Mode Flag 64/32 CPUID Feature Instruction En Bit Mode Flag CPUID Feature Instruction En Bit Mode Flag Op/ 64/32 CPUID Feature </th>
<th></th>
<th>Support</th>
<th></th>
<th>Description</th></tr>
<tr>
<td>EVEX.128.F2.MAP6.W0 56 /r VFCMADDCPH xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16 AVX512VL</td>
<td>Complex multiply a pair of FP16 values from xmm2 and complex conjugate of xmm3/m128/m32bcst, add to xmm1 and store the result in xmm1 subject to writemask k1.</td></tr>
<tr>
<td>EVEX.256.F2.MAP6.W0 56 /r VFCMADDCPH ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16 AVX512VL</td>
<td>Complex multiply a pair of FP16 values from ymm2 and complex conjugate of ymm3/m256/m32bcst, add to ymm1 and store the result in ymm1 subject to writemask k1.</td></tr>
<tr>
<td>EVEX.512.F2.MAP6.W0 56 /r VFCMADDCPH zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst {er}</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16</td>
<td>Complex multiply a pair of FP16 values from zmm2 and complex conjugate of zmm3/m512/m32bcst, add to zmm1 and store the result in zmm1 subject to writemask k1.</td></tr>
<tr>
<td>EVEX.128.F3.MAP6.W0 56 /r VFMADDCPH xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16 AVX512VL</td>
<td>Complex multiply a pair of FP16 values from xmm2 and xmm3/m128/m32bcst, add to xmm1 and store the result in xmm1 subject to writemask k1.</td></tr>
<tr>
<td>EVEX.256.F3.MAP6.W0 56 /r VFMADDCPH ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16 AVX512VL</td>
<td>Complex multiply a pair of FP16 values from ymm2 and ymm3/m256/m32bcst, add to ymm1 and store the result in ymm1 subject to writemask k1.</td></tr>
<tr>
<td>EVEX.512.F3.MAP6.W0 56 /r VFMADDCPH zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst {er}</td>
<td>A</td>
<td>V/V</td>
<td>AVX512-FP16</td>
<td>Complex multiply a pair of FP16 values from zmm2 and zmm3/m512/m32bcst, add to zmm1 and store the result in zmm1 subject to writemask k1.</td></tr></table>
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
</a></h2>
<table>
<tr>
<th>Op/En</th>
<th>Tuple</th>
<th>Operand 1</th>
<th>Operand 2</th>
<th>Operand 3</th>
<th>Operand 4</th></tr>
<tr>
<td>A</td>
<td>Full</td>
<td>ModRM:reg (r, w)</td>
<td>VEX.vvvv (r)</td>
<td>ModRM:r/m (r)</td>
<td>N/A</td></tr></table>
<h3 id="description">Description<a class="anchor" href="#description">
</a></h3>
<p>This instruction performs a complex multiply and accumulate operation. There are normal and complex conjugate forms of the operation.</p>
<p>The broadcasting and masking for this operation is done on 32-bit quantities representing a pair of FP16 values.</p>
<p>Rounding is performed at every FMA (fused multiply and add) boundary. Execution occurs as if all MXCSR exceptions are masked. MXCSR status bits are updated to reflect exceptional conditions.</p>
<h3 id="operation">Operation<a class="anchor" href="#operation">
</a></h3>
<h4 id="vfcmaddcph-dest-k1---src1--src2--avx512-">VFCMADDCPH dest{k1}, src1, src2 (AVX512)<a class="anchor" href="#vfcmaddcph-dest-k1---src1--src2--avx512-">
</a></h4>
<pre>VL = 128, 256, 512
KL := VL / 32
FOR i := 0 to KL-1:
IF k1[i] or *no writemask*:
IF broadcasting and src2 is memory:
tsrc2.fp16[2*i+0] := src2.fp16[0]
tsrc2.fp16[2*i+1] := src2.fp16[1]
ELSE:
tsrc2.fp16[2*i+0] := src2.fp16[2*i+0]
tsrc2.fp16[2*i+1] := src2.fp16[2*i+1]
FOR i := 0 to KL-1:
IF k1[i] or *no writemask*:
tmp[2*i+0] := dest.fp16[2*i+0] + src1.fp16[2*i+0] * tsrc2.fp16[2*i+0]
tmp[2*i+1] := dest.fp16[2*i+1] + src1.fp16[2*i+1] * tsrc2.fp16[2*i+0]
FOR i := 0 to KL-1:
IF k1[i] or *no writemask*:
// conjugate version subtracts odd final term
dest.fp16[2*i+0] := tmp[2*i+0] + src1.fp16[2*i+1] * tsrc2.fp16[2*i+1]
dest.fp16[2*i+1] := tmp[2*i+1] - src1.fp16[2*i+0] * tsrc2.fp16[2*i+1]
ELSE IF *zeroing*:
dest.fp16[2*i+0] := 0
dest.fp16[2*i+1] := 0
DEST[MAXVL-1:VL] := 0
</pre>
<h4 id="vfmaddcph-dest-k1---src1--src2--avx512-">VFMADDCPH dest{k1}, src1, src2 (AVX512)<a class="anchor" href="#vfmaddcph-dest-k1---src1--src2--avx512-">
</a></h4>
<pre>VL = 128, 256, 512
KL := VL / 32
FOR i := 0 to KL-1:
IF k1[i] or *no writemask*:
IF broadcasting and src2 is memory:
tsrc2.fp16[2*i+0] := src2.fp16[0]
tsrc2.fp16[2*i+1] := src2.fp16[1]
ELSE:
tsrc2.fp16[2*i+0] := src2.fp16[2*i+0]
tsrc2.fp16[2*i+1] := src2.fp16[2*i+1]
FOR i := 0 to KL-1:
IF k1[i] or *no writemask*:
tmp[2*i+0] := dest.fp16[2*i+0] + src1.fp16[2*i+0] * tsrc2.fp16[2*i+0]
tmp[2*i+1] := dest.fp16[2*i+1] + src1.fp16[2*i+1] * tsrc2.fp16[2*i+0]
FOR i := 0 to KL-1:
IF k1[i] or *no writemask*:
// non-conjugate version subtracts even term
dest.fp16[2*i+0] := tmp[2*i+0] - src1.fp16[2*i+1] * tsrc2.fp16[2*i+1]
dest.fp16[2*i+1] := tmp[2*i+1] + src1.fp16[2*i+0] * tsrc2.fp16[2*i+1]
ELSE IF *zeroing*:
dest.fp16[2*i+0] := 0
dest.fp16[2*i+1] := 0
DEST[MAXVL-1:VL] := 0
</pre>
<h3 id="intel-c-c++-compiler-intrinsic-equivalent">Intel C/C++ Compiler Intrinsic Equivalent<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalent">
</a></h3>
<pre>VFCMADDCPH __m128h _mm_fcmadd_pch (__m128h a, __m128h b, __m128h c);
</pre>
<pre>VFCMADDCPH __m128h _mm_mask_fcmadd_pch (__m128h a, __mmask8 k, __m128h b, __m128h c);
</pre>
<pre>VFCMADDCPH __m128h _mm_mask3_fcmadd_pch (__m128h a, __m128h b, __m128h c, __mmask8 k);
</pre>
<pre>VFCMADDCPH __m128h _mm_maskz_fcmadd_pch (__mmask8 k, __m128h a, __m128h b, __m128h c);
</pre>
<pre>VFCMADDCPH __m256h _mm256_fcmadd_pch (__m256h a, __m256h b, __m256h c);
</pre>
<pre>VFCMADDCPH __m256h _mm256_mask_fcmadd_pch (__m256h a, __mmask8 k, __m256h b, __m256h c);
</pre>
<pre>VFCMADDCPH __m256h _mm256_mask3_fcmadd_pch (__m256h a, __m256h b, __m256h c, __mmask8 k);
</pre>
<pre>VFCMADDCPH __m256h _mm256_maskz_fcmadd_pch (__mmask8 k, __m256h a, __m256h b, __m256h c);
</pre>
<pre>VFCMADDCPH __m512h _mm512_fcmadd_pch (__m512h a, __m512h b, __m512h c);
</pre>
<pre>VFCMADDCPH __m512h _mm512_mask_fcmadd_pch (__m512h a, __mmask16 k, __m512h b, __m512h c);
</pre>
<pre>VFCMADDCPH __m512h _mm512_mask3_fcmadd_pch (__m512h a, __m512h b, __m512h c, __mmask16 k);
</pre>
<pre>VFCMADDCPH __m512h _mm512_maskz_fcmadd_pch (__mmask16 k, __m512h a, __m512h b, __m512h c);
</pre>
<pre>VFCMADDCPH __m512h _mm512_fcmadd_round_pch (__m512h a, __m512h b, __m512h c, const int rounding);
</pre>
<pre>VFCMADDCPH __m512h _mm512_mask_fcmadd_round_pch (__m512h a, __mmask16 k, __m512h b, __m512h c, const int rounding);
</pre>
<pre>VFCMADDCPH __m512h _mm512_mask3_fcmadd_round_pch (__m512h a, __m512h b, __m512h c, __mmask16 k, const int rounding);
</pre>
<pre>VFCMADDCPH __m512h _mm512_maskz_fcmadd_round_pch (__mmask16 k, __m512h a, __m512h b, __m512h c, const int rounding);
</pre>
<pre>VFMADDCPH __m128h _mm_fmadd_pch (__m128h a, __m128h b, __m128h c);
</pre>
<pre>VFMADDCPH __m128h _mm_mask_fmadd_pch (__m128h a, __mmask8 k, __m128h b, __m128h c);
</pre>
<pre>VFMADDCPH __m128h _mm_mask3_fmadd_pch (__m128h a, __m128h b, __m128h c, __mmask8 k);
</pre>
<pre>VFMADDCPH __m128h _mm_maskz_fmadd_pch (__mmask8 k, __m128h a, __m128h b, __m128h c);
</pre>
<pre>VFMADDCPH __m256h _mm256_fmadd_pch (__m256h a, __m256h b, __m256h c);
</pre>
<pre>VFMADDCPH __m256h _mm256_mask_fmadd_pch (__m256h a, __mmask8 k, __m256h b, __m256h c);
</pre>
<pre>VFMADDCPH __m256h _mm256_mask3_fmadd_pch (__m256h a, __m256h b, __m256h c, __mmask8 k);
</pre>
<pre>VFMADDCPH __m256h _mm256_maskz_fmadd_pch (__mmask8 k, __m256h a, __m256h b, __m256h c);
</pre>
<pre>VFMADDCPH __m512h _mm512_fmadd_pch (__m512h a, __m512h b, __m512h c);
</pre>
<pre>VFMADDCPH __m512h _mm512_mask_fmadd_pch (__m512h a, __mmask16 k, __m512h b, __m512h c);
</pre>
<pre>VFMADDCPH __m512h _mm512_mask3_fmadd_pch (__m512h a, __m512h b, __m512h c, __mmask16 k);
</pre>
<pre>VFMADDCPH __m512h _mm512_maskz_fmadd_pch (__mmask16 k, __m512h a, __m512h b, __m512h c);
</pre>
<pre>VFMADDCPH __m512h _mm512_fmadd_round_pch (__m512h a, __m512h b, __m512h c, const int rounding);
</pre>
<pre>VFMADDCPH __m512h _mm512_mask_fmadd_round_pch (__m512h a, __mmask16 k, __m512h b, __m512h c, const int rounding);
</pre>
<pre>VFMADDCPH __m512h _mm512_mask3_fmadd_round_pch (__m512h a, __m512h b, __m512h c, __mmask16 k, const int rounding);
</pre>
<pre>VFMADDCPH __m512h _mm512_maskz_fmadd_round_pch (__mmask16 k, __m512h a, __m512h b, __m512h c, const int rounding);
</pre>
<h3 class="exceptions" id="simd-floating-point-exceptions">SIMD Floating-Point Exceptions<a class="anchor" href="#simd-floating-point-exceptions">
</a></h3>
<p>Invalid, Underflow, Overflow, Precision, Denormal.</p>
<h3 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
</a></h3>
<p>EVEX-encoded instructions, see <span class="not-imported">Table 2-49</span>, “Type E4 Class Exception Conditions.”</p>
<p>Additionally:</p>
<table>
<tr>
<td>#UD</td>
<td>If (dest_reg == src1_reg) or (dest_reg == src2_reg).</td></tr></table><footer><p>
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developers Manual</a> for anything serious.
</p></footer></body></html>