ia32-64/x86/subps.html
2025-07-08 02:23:29 -03:00

207 lines
9.2 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>SUBPS
— Subtract Packed Single Precision Floating-Point Values</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>SUBPS
— Subtract Packed Single Precision Floating-Point Values</h1>
<table>
<tr>
<th>Opcode/Instruction</th>
<th>Op/E n</th>
<th>64/32 bit Mode Support</th>
<th>CPUID Feature Flag</th>
<th>Description</th></tr>
<tr>
<td>NP 0F 5C /r SUBPS xmm1, xmm2/m128</td>
<td>A</td>
<td>V/V</td>
<td>SSE</td>
<td>Subtract packed single precision floating-point values in xmm2/mem from xmm1 and store result in xmm1.</td></tr>
<tr>
<td>VEX.128.0F.WIG 5C /r VSUBPS xmm1,xmm2, xmm3/m128</td>
<td>B</td>
<td>V/V</td>
<td>AVX</td>
<td>Subtract packed single precision floating-point values in xmm3/mem from xmm2 and stores result in xmm1.</td></tr>
<tr>
<td>VEX.256.0F.WIG 5C /r VSUBPS ymm1, ymm2, ymm3/m256</td>
<td>B</td>
<td>V/V</td>
<td>AVX</td>
<td>Subtract packed single precision floating-point values in ymm3/mem from ymm2 and stores result in ymm1.</td></tr>
<tr>
<td>EVEX.128.0F.W0 5C /r VSUBPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst</td>
<td>C</td>
<td>V/V</td>
<td>AVX512VL AVX512F</td>
<td>Subtract packed single precision floating-point values from xmm3/m128/m32bcst to xmm2 and stores result in xmm1 with writemask k1.</td></tr>
<tr>
<td>EVEX.256.0F.W0 5C /r VSUBPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst</td>
<td>C</td>
<td>V/V</td>
<td>AVX512VL AVX512F</td>
<td>Subtract packed single precision floating-point values from ymm3/m256/m32bcst to ymm2 and stores result in ymm1 with writemask k1.</td></tr>
<tr>
<td>EVEX.512.0F.W0 5C /r VSUBPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er}</td>
<td>C</td>
<td>V/V</td>
<td>AVX512F</td>
<td>Subtract packed single precision floating-point values in zmm3/m512/m32bcst from zmm2 and stores result in zmm1 with writemask k1.</td></tr></table>
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
</a></h2>
<table>
<tr>
<th>Op/En</th>
<th>Tuple Type</th>
<th>Operand 1</th>
<th>Operand 2</th>
<th>Operand 3</th>
<th>Operand 4</th></tr>
<tr>
<td>A</td>
<td>N/A</td>
<td>ModRM:reg (r, w)</td>
<td>ModRM:r/m (r)</td>
<td>N/A</td>
<td>N/A</td></tr>
<tr>
<td>B</td>
<td>N/A</td>
<td>ModRM:reg (w)</td>
<td>VEX.vvvv (r)</td>
<td>ModRM:r/m (r)</td>
<td>N/A</td></tr>
<tr>
<td>C</td>
<td>Full</td>
<td>ModRM:reg (w)</td>
<td>EVEX.vvvv (r)</td>
<td>ModRM:r/m (r)</td>
<td>N/A</td></tr></table>
<h2 id="description">Description<a class="anchor" href="#description">
</a></h2>
<p>Performs a SIMD subtract of the packed single precision floating-point values in the second Source operand from the First Source operand, and stores the packed single precision floating-point results in the destination operand.</p>
<p>VEX.128 and EVEX.128 encoded versions: The second source operand is an XMM register or an 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (MAXVL-1:128) of the corresponding destination register are zeroed.</p>
<p>VEX.256 and EVEX.256 encoded versions: The second source operand is an YMM register or an 256-bit memory location. The first source operand and destination operands are YMM registers. Bits (MAXVL-1:256) of the corresponding destination register are zeroed.</p>
<p>EVEX.512 encoded version: The second source operand is a ZMM register, a 512-bit memory location or a 512-bit vector broadcasted from a 32-bit memory location. The first source operand and destination operands are ZMM registers. The destination operand is conditionally updated according to the writemask.</p>
<p>128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The destination is not distinct from the first source XMM register and the upper Bits (MAXVL-1:128) of the corresponding register destination are unmodified.</p>
<h2 id="operation">Operation<a class="anchor" href="#operation">
</a></h2>
<h3 id="vsubps--evex-encoded-versions-when-src2-operand-is-a-vector-register-">VSUBPS (EVEX Encoded Versions When SRC2 Operand is a Vector Register)<a class="anchor" href="#vsubps--evex-encoded-versions-when-src2-operand-is-a-vector-register-">
</a></h3>
<pre>(KL, VL) = (4, 128), (8, 256), (16, 512)
IF (VL = 512) AND (EVEX.b = 1)
THEN
SET_ROUNDING_MODE_FOR_THIS_INSTRUCTION(EVEX.RC);
ELSE
SET_ROUNDING_MODE_FOR_THIS_INSTRUCTION(MXCSR.RC);
FI;
FOR j := 0 TO KL-1
i := j * 32
IF k1[j] OR *no writemask*
THEN DEST[i+31:i] := SRC1[i+31:i] - SRC2[i+31:i]
ELSE
IF *merging-masking* ; merging-masking
THEN *DEST[31:0] remains unchanged*
ELSE ; zeroing-masking
DEST[31:0] := 0
FI;
FI;
ENDFOR;
DEST[MAXVL-1:VL] := 0
</pre>
<h3 id="vsubps--evex-encoded-versions-when-src2-operand-is-a-memory-source-">VSUBPS (EVEX Encoded Versions When SRC2 Operand is a Memory Source)<a class="anchor" href="#vsubps--evex-encoded-versions-when-src2-operand-is-a-memory-source-">
</a></h3>
<pre>(KL, VL) = (4, 128), (8, 256),(16, 512)
FOR j := 0 TO KL-1
i := j * 32
IF k1[j] OR *no writemask* THEN
IF (EVEX.b = 1)
THEN DEST[i+31:i] := SRC1[i+31:i] - SRC2[31:0];
ELSE DEST[i+31:i] := SRC1[i+31:i] - SRC2[i+31:i];
FI;
ELSE
IF *merging-masking* ; merging-masking
THEN *DEST[31:0] remains unchanged*
ELSE ; zeroing-masking
DEST[31:0] := 0
FI;
FI;
ENDFOR;
DEST[MAXVL-1:VL] := 0
</pre>
<h3 id="vsubps--vex-256-encoded-version-">VSUBPS (VEX.256 Encoded Version)<a class="anchor" href="#vsubps--vex-256-encoded-version-">
</a></h3>
<pre>DEST[31:0] := SRC1[31:0] - SRC2[31:0]
DEST[63:32] := SRC1[63:32] - SRC2[63:32]
DEST[95:64] := SRC1[95:64] - SRC2[95:64]
DEST[127:96] := SRC1[127:96] - SRC2[127:96]
DEST[159:128] := SRC1[159:128] - SRC2[159:128]
DEST[191:160] := SRC1[191:160] - SRC2[191:160]
DEST[223:192] := SRC1[223:192] - SRC2[223:192]
DEST[255:224] := SRC1[255:224] - SRC2[255:224].
DEST[MAXVL-1:256] := 0
</pre>
<h3 id="vsubps--vex-128-encoded-version-">VSUBPS (VEX.128 Encoded Version)<a class="anchor" href="#vsubps--vex-128-encoded-version-">
</a></h3>
<pre>DEST[31:0] := SRC1[31:0] - SRC2[31:0]
DEST[63:32] := SRC1[63:32] - SRC2[63:32]
DEST[95:64] := SRC1[95:64] - SRC2[95:64]
DEST[127:96] := SRC1[127:96] - SRC2[127:96]
DEST[MAXVL-1:128] := 0
</pre>
<h3 id="subps--128-bit-legacy-sse-version-">SUBPS (128-bit Legacy SSE Version)<a class="anchor" href="#subps--128-bit-legacy-sse-version-">
</a></h3>
<pre>DEST[31:0] := SRC1[31:0] - SRC2[31:0]
DEST[63:32] := SRC1[63:32] - SRC2[63:32]
DEST[95:64] := SRC1[95:64] - SRC2[95:64]
DEST[127:96] := SRC1[127:96] - SRC2[127:96]
DEST[MAXVL-1:128] (Unmodified)
</pre>
<h2 id="intel-c-c++-compiler-intrinsic-equivalent">Intel C/C++ Compiler Intrinsic Equivalent<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalent">
</a></h2>
<pre>VSUBPS __m512 _mm512_sub_ps (__m512 a, __m512 b);
</pre>
<pre>VSUBPS __m512 _mm512_mask_sub_ps (__m512 s, __mmask16 k, __m512 a, __m512 b);
</pre>
<pre>VSUBPS __m512 _mm512_maskz_sub_ps (__mmask16 k, __m512 a, __m512 b);
</pre>
<pre>VSUBPS __m512 _mm512_sub_round_ps (__m512 a, __m512 b, int);
</pre>
<pre>VSUBPS __m512 _mm512_mask_sub_round_ps (__m512 s, __mmask16 k, __m512 a, __m512 b, int);
</pre>
<pre>VSUBPS __m512 _mm512_maskz_sub_round_ps (__mmask16 k, __m512 a, __m512 b, int);
</pre>
<pre>VSUBPS __m256 _mm256_sub_ps (__m256 a, __m256 b);
</pre>
<pre>VSUBPS __m256 _mm256_mask_sub_ps (__m256 s, __mmask8 k, __m256 a, __m256 b);
</pre>
<pre>VSUBPS __m256 _mm256_maskz_sub_ps (__mmask16 k, __m256 a, __m256 b);
</pre>
<pre>SUBPS __m128 _mm_sub_ps (__m128 a, __m128 b);
</pre>
<pre>VSUBPS __m128 _mm_mask_sub_ps (__m128 s, __mmask8 k, __m128 a, __m128 b);
</pre>
<pre>VSUBPS __m128 _mm_maskz_sub_ps (__mmask16 k, __m128 a, __m128 b);
</pre>
<h2 class="exceptions" id="simd-floating-point-exceptions">SIMD Floating-Point Exceptions<a class="anchor" href="#simd-floating-point-exceptions">
</a></h2>
<p>Overflow, Underflow, Invalid, Precision, Denormal.</p>
<h2 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
</a></h2>
<p>VEX-encoded instructions, see <span class="not-imported">Table 2-19</span>, “Type 2 Class Exception Conditions.”</p>
<p>EVEX-encoded instructions, see <span class="not-imported">Table 2-46</span>, “Type E2 Class Exception Conditions.”</p><footer><p>
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developers Manual</a> for anything serious.
</p></footer></body></html>