ia32-64/x86/lddqu.html
2025-07-08 02:23:29 -03:00

101 lines
5.9 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:x86="http://www.felixcloutier.com/x86"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="stylesheet" type="text/css" href="style.css"></link><title>LDDQU
— Load Unaligned Integer 128 Bits</title></head><body><header><nav><ul><li><a href='index.html'>Index</a></li><li>December 2023</li></ul></nav></header><h1>LDDQU
— Load Unaligned Integer 128 Bits</h1>
<table>
<tr>
<th>Opcode/Instruction</th>
<th>Op/En</th>
<th>64/32-bit Mode</th>
<th>CPUID Feature Flag</th>
<th>Description</th></tr>
<tr>
<td>F2 0F F0 /r LDDQU xmm1, mem</td>
<td>RM</td>
<td>V/V</td>
<td>SSE3</td>
<td>Load unaligned data from mem and return double quadword in xmm1.</td></tr>
<tr>
<td>VEX.128.F2.0F.WIG F0 /r VLDDQU xmm1, m128</td>
<td>RM</td>
<td>V/V</td>
<td>AVX</td>
<td>Load unaligned packed integer values from mem to xmm1.</td></tr>
<tr>
<td>VEX.256.F2.0F.WIG F0 /r VLDDQU ymm1, m256</td>
<td>RM</td>
<td>V/V</td>
<td>AVX</td>
<td>Load unaligned packed integer values from mem to ymm1.</td></tr></table>
<h2 id="instruction-operand-encoding">Instruction Operand Encoding<a class="anchor" href="#instruction-operand-encoding">
</a></h2>
<table>
<tr>
<th>Op/En</th>
<th>Operand 1</th>
<th>Operand 2</th>
<th>Operand 3</th>
<th>Operand 4</th></tr>
<tr>
<td>RM</td>
<td>ModRM:reg (w)</td>
<td>ModRM:r/m (r)</td>
<td>N/A</td>
<td>N/A</td></tr></table>
<h2 id="description">Description<a class="anchor" href="#description">
</a></h2>
<p>The instruction is <em>functionally similar</em> to (V)MOVDQU ymm/xmm, m256/m128 for loading from memory. That is: 32/16 bytes of data starting at an address specified by the source memory operand (second operand) are fetched from memory and placed in a destination register (first operand). The source operand need not be aligned on a 32/16-byte boundary. Up to 64/32 bytes may be loaded from memory; this is implementation dependent.</p>
<p>This instruction may improve performance relative to (V)MOVDQU if the source operand crosses a cache line boundary. In situations that require the data loaded by (V)LDDQU be modified and stored to the same location, use (V)MOVDQU or (V)MOVDQA instead of (V)LDDQU. To move a double quadword to or from memory locations that are known to be aligned on 16-byte boundaries, use the (V)MOVDQA instruction.</p>
<h2 id="implementation-notes">Implementation Notes<a class="anchor" href="#implementation-notes">
</a></h2>
<ul>
<li>If the source is aligned to a 32/16-byte boundary, based on the implementation, the 32/16 bytes may be loaded more than once. For that reason, the usage of (V)LDDQU should be avoided when using uncached or write-combining (WC) memory regions. For uncached or WC memory regions, keep using (V)MOVDQU.</li>
<li>This instruction is a replacement for (V)MOVDQU (load) in situations where cache line splits significantly affect performance. It should not be used in situations where store-load forwarding is performance critical. If performance of store-load forwarding is critical to the application, use (V)MOVDQA store-load pairs when data is 256/128-bit aligned or (V)MOVDQU store-load pairs when data is 256/128-bit unaligned.</li>
<li>If the memory address is not aligned on 32/16-byte boundary, some implementations may load up to 64/32 bytes and return 32/16 bytes in the destination. Some processor implementations may issue multiple loads to access the appropriate 32/16 bytes. Developers of multi-threaded or multi-processor software should be aware that on these processors the loads will be performed in a non-atomic way.</li>
<li>If alignment checking is enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check exception (#AC) may or may not be generated (depending on processor implementation) when the memory address is not aligned on an 8-byte boundary.</li></ul>
<p>In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).</p>
<p>Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.</p>
<h2 id="operation">Operation<a class="anchor" href="#operation">
</a></h2>
<h3 id="lddqu--128-bit-legacy-sse-version-">LDDQU (128-bit Legacy SSE Version)<a class="anchor" href="#lddqu--128-bit-legacy-sse-version-">
</a></h3>
<pre>DEST[127:0] := SRC[127:0]
DEST[MAXVL-1:128] (Unmodified)
</pre>
<h3 id="vlddqu--vex-128-encoded-version-">VLDDQU (VEX.128 Encoded Version)<a class="anchor" href="#vlddqu--vex-128-encoded-version-">
</a></h3>
<pre>DEST[127:0] := SRC[127:0]
DEST[MAXVL-1:128] := 0
</pre>
<h3 id="vlddqu--vex-256-encoded-version-">VLDDQU (VEX.256 Encoded Version)<a class="anchor" href="#vlddqu--vex-256-encoded-version-">
</a></h3>
<pre>DEST[255:0] := SRC[255:0]
</pre>
<h2 id="intel-c-c++-compiler-intrinsic-equivalent">Intel C/C++ Compiler Intrinsic Equivalent<a class="anchor" href="#intel-c-c++-compiler-intrinsic-equivalent">
</a></h2>
<pre>LDDQU __m128i _mm_lddqu_si128 (__m128i * p);
</pre>
<pre>VLDDQU __m256i _mm256_lddqu_si256 (__m256i * p);
</pre>
<h2 class="exceptions" id="numeric-exceptions">Numeric Exceptions<a class="anchor" href="#numeric-exceptions">
</a></h2>
<p>None.</p>
<h2 class="exceptions" id="other-exceptions">Other Exceptions<a class="anchor" href="#other-exceptions">
</a></h2>
<p>See <span class="not-imported">Table 2-21</span>, “Type 4 Class Exception Conditions.”</p>
<p>Note treatment of #AC varies.</p><footer><p>
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be
inc<span style="opacity: 0.2">omp</span>lete or b<sub>r</sub>oke<sub>n</sub> in various obvious or non-obvious
ways. Refer to <a href="https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4">Intel® 64 and IA-32 Architectures Software Developers Manual</a> for anything serious.
</p></footer></body></html>