Instruction Page Help

This page details many of the features of the instruction pages.

Overview Table

The overview table lists all the various forms that an instruction can take. Each row of the table consists of the following items, in order:

  • Opcode and Mnemonic: A single form of the instruction listing both the binary encoding and assembly form. Italics in the mnemonic part signify operands. See below for an explanation on interpreting VEX and EVEX opcodes.
    EVEX forms commonly feature other bits of information such as the mask register ({k1}), error masking ({er}), and more.
  • Encoding: A reference to the encoding table. This value represents where in the instruction the operands are encoded.
  • ## bit Mode (multiple): Whether a given instruction form is valid, invalid, or not encodable in the specified processor mode. "Valid" forms are allowed while "invalid" forms will throw an exception if encountered. "Not encodable" forms are disallowed in the specified mode, and will, in fact, be interpreted differently than expected.
    For example, in 64 bit mode, the byte range 40 through 4F was repurposed for the REX prefix. This makes encoding INC eax as 40 impossible. Should the processor encounter what the author thinks is INC eax, it will treat it as a REX prefix with the lower four bits set to 0. The correct encoding would be FF C0.
  • CPUID Feature Flag (optional): If present, these CPUID "feature flags" must be present (set). The existence of these flags does not necessarily imply the ability to execute the instruction; Some CPU features must be enabled before use. Failure to do so will result in a processor exception being thrown.
  • Description: A short description of what the instruction form does. For most instructions, the various "Description" cells will be almost carbon copies of each other with minor changes.

Interpreting VEX and EVEX Opcodes

VEX and EVEX opcodes are written differently than normal instructions. This is because the prefixes are multiple (two to four) bytes long and encode quite a bit of information. Both prefixes take the form of (E)VEX.{length}.{prefixes}.{w} with each field representing a specific field in the VEX or EVEX prefix. The other fields in the prefix are unspecified here and are dependent on the operands. The various fields in the opcode prefix encoding are:

  • length: The amount of bits this instruction operates on. This is encoded in the L and (for EVEX) L' bits. This can be one of: 128 (XMM), 256 (YMM), 512 (ZMM), or LIG. LIG stands for "length ignored" and means just that - the length field is ignored. This is typically used in situations involving scalars as only a single piece of data is operated on, not the whole register.
    In some situations, despite an instruction being defined with LIG, Intel may recommend a specific value is used instead for future proofing. For example, the ADDSD instruction is defined to be LIG, but Intel recommends setting L (and L' for EVEX) to zero.
  • prefixes: The implied prefix bytes that are encoded in the prefix. Due to the nature of the VEX and EVEX prefixes, there can be up to two prefix fields specified in the opcode encoding: one for operand size and type prefixes (the pp field), and one for escape codes (the m bits). If unspecified, a prefix group's bits must be all zeros (indicating no prefix).
  • w: The single W bit in the VEX or EVEX prefix. This is commonly used as an extra bit to specify the opcode, but will sometimes be used as its predecessor, REX.W, meant - expanding the operand size to 64 bits. WIG stands for "W ignored" and means just that - the W bit is ignored.

Encoding

The "Encoding" section is a table listing the encoding of the operands for the various opcodes in the overview table at the start. Each row of the table consists of the following items, in order:

  • Encoding: The name of the encoding this row is for. For example, if the "Encoding" cell of a mnemonic in the overview table contains RM, the row containing RM in this cell would list how the operands are encoded.
  • Tuple Type (optional): The EVEX encoding's tuple form.This column is only present if an EVEX encoding for this instruction exists. If present, any encoding that does not use an EVEX prefix will contain "N/A".
  • Operand(s): The actual encoding of each operand. Instructions that contain a different number of operands depending on the mnemonic (for example, vector instructions with a legacy encoding) will contain "N/A" for disallowed operands. In other words, "legacy" vector encodings will typically have the first source and the destination be the same operand (MNEMONIC dest, src), but VEX and EVEX versions with a "non-destructive" form (MNEMONIC dest, src1, src2) will not. In these cases, the "legacy" form will only have two operands while the VEX and EVEX forms will have three. As such, the "Operand 3" cell will be empty.
    See below for an explanation on interpreting this value.

Interpreting the Operand Value

The operand value cell takes the form of source[rw] which represents a data, source, that is both read from and written to ([rw]). Read only or write only data is signified by [r] and [w], respectively.

source only specifies where the register number is encoded. It does not specify which register file is used (general purpose, segment, vector, etc.); That is specified by the mnemonic's encoding.

source will be one of the following values:

  • address##: An immediate value of size ## that represents a "direct" address in the address space. If multiple values of ## are allowed, they will be separated with a slash.
  • AL/AX/EAX/RAX: The accumulator register.
  • DS:SI: Memory addressed by the DS:SI register pair. DS:ESI and DS:RSI may be used instead depending on the processor's mode.
  • ES:DI: Memory addressed by the ES:DI register pair. ES:EDI and ES:RDI may be used instead depending on the processor's mode.
  • EVEX.vvvv: The vvvv field of an EVEX prefix represents the register.
  • FLAGS: The FLAGS register.
  • imm##: An immediate value of size ##. If multiple values of ## are allowed, they will be separated with a slash.
  • imm8(7..4): The upper four bits of an 8 bit immediate represents the register. In 32 bit "protected" mode, the most significant bit (MSB; bit 7) is ignored and treated as if it were 0.
  • ModRM.reg: The reg field of a ModR/M byte represents the register. The three bits can be extended to four using one of the following prefixes: REX, VEX, or EVEX.
  • ModRM.r/m: If the mod field of a ModR/M byte signifies a register, the r/m field represents the register. The three bits can be extended to four using one of the following prefixes: REX, VEX, or EVEX. If, however, the mod field of a ModR/M byte signifies memory, the address is calculated and used instead.
  • offset##: An immediate value of size ## that represents an offset from the following instruction. If multiple values of ## are allowed, they will be separated with a slash.
    For example, an infinite loop (a: JMP a) would be encoded as EB FE where FE represents negative 2. This would jump backwards two bytes to the a label and begin again. In fact, a "nop" could be encoded as EB 00 which would be a simple jump to the following instruction (zero bytes ahead).
  • VEX.vvvv: The vvvv field of a VEX prefix represents the register.

Bit Encoding

The "Bit Encoding" section details the actual bit representation of the various instruction forms. These bits will be grouped per byte, and separated by a colon (:) to show the order in a byte stream they would appear. Each byte will be written as either a two-hexdigit value (if possible) or as individual bits. If the bits will be written out one-by-one, they will be grouped into either four ("nibbles") or three bits (octal-like).

Interpreting Named Bits

Sometimes, individual bits are named to represent their function. Generally, this is only used to show the individual bits of a byte (such as the two bit mod field of a ModR/M byte), or to show where registers are encoded. However, sometimes, two forms of an instruction will differ only in a few bits, and those bits have a defined meaning. For example, if a string of bits contains a character such as s, this would indicate if sign extension of an operand occurs.

These named bits will be one of the following:

  • d (direction): Specifies which direction data flows from and into. This is commonly used for ALU instructions from the original 8086. This can have one of two values:
    ValueSourceDestination
    0reg fieldr/m field with an optional SIB byte
    1r/m field with an optional SIB bytereg field
  • eee (special register): When control or debug registers are used in an instruction, they are represented using eee. Whether a control or debug register is used depends on the instruction, but both will never be used at the same time. This can have one of 16 values. If REX.R, VEX.R, or EVEX.R is not present, only the first eight possible values are available.
    ValueControl RegisterDestination Register
    0.000CR0DR0
    0.001reservedDR1
    0.010CR2DR2
    0.011CR3DR3
    0.100CR4reserved
    0.101reservedreserved
    0.110reservedDR6
    0.111reservedDR7
    1.000CR8reserved
    1.001reservedreserved
    1.010reservedreserved
    1.011reservedreserved
    1.100reservedreserved
    1.101reservedreserved
    1.110reservedreserved
    1.111reservedreserved
    The first bit represents the R field in a REX, VEX, or EVEX prefix. The other three are the eee field.
    Usage of reserved encodings will lead to a #UD exception.
  • reg (general purpose register): There are eight general purpose registers (16 in Long Mode). Which one is used depends on the bits of this reg field (combined with REX.R, VEX.R, or EVEX.R if present), the w field (if present), and the current processor mode.
    Selected Register When w is not Present
    Value16 bit Operations32 bit Operations64 bit Operations
    000AXEAXRAX
    001CXECXRCX
    010DXEDXRDX
    011BXEBXRBX
    100SPESPRSP
    101BPEBPRBP
    110SIESIRSI
    111DIEDIRDI
    Selected Register When w is Present
    Valuew Unsetw Set; 16 bit Operationsw Set; 32 bit Operations
    000ALAXEAX
    001CLCXECX
    010DLDXEDX
    011BLBXEBX
    100AHSPESP
    101CHBPEBP
    110DHSIESI
    111BHDIEDI
  • s (sign extend): Specifies whether an immediate is sign extended or left alone. This can have one of two values:
    ValueEffect on 8 bit DataEffect on 16 or 32 bit Data
    0nonenone
    1sign extended to size of destinationnone
    A quirk of this field is that the opcodes beginning with 82 (8086 ALU operations) perform the same operation as ones beginning with 80. For example, ADD r/m8, imm8 is documented as being encoded as 80 /0 ib, but can also be encoded as 82 /0 ib. This has the effect of sign extending the 8 bit immediate to the size of the 8 bit destination (i.e. doing nothing). These encodings are undocumented and were removed in Long Mode (a #UD exception will result).
  • sreg# (segment register): Either a two or three bit field specifying a segment register. If sreg2 is used, access to the FS and GS segments is unavailable. If sreg3 is used, access to all six segment registers is available:
    ValueSegment Register
    0.00ES
    0.01CS
    0.10SS
    0.11DS
    1.00FS
    1.01GS
    1.10reserved
    1.11reserved
    The first bit represents the most significant bit of an sreg3 field. The other two are the sreg2 field.
    Usage of reserved encodings will lead to a #UD exception.
  • tttn (condition test): Conditional instructions have the condition encoded in this four bit field. The first three (ttt) are the condition to test, and the fourth determines if the condition is used directly (n = 0), or its negated form (n = 1). These four bits are encoded in the four least significant bits (bits 3, 2, 1, and 0) of the opcode byte for single byte opcodes, or the four least significant bits of the second opcode byte for two byte opcodes. These bits have the following values:
    ValueMnemonic SuffixConditionCheckSigned or Unsigned
    0000OOverflowOF == 1Neither
    0001NONo overflowOF == 0Neither
    0010B, NAE, CBelow, Not above or equal, CarryCF == 1Unsigned
    0011NB, AE, NCNot below, Above or equal, No carryCF == 0Unsigned
    0100E, ZEqual, ZeroZF == 1Neither
    0101NE, NZNot equal, Not zeroZF == 0Neither
    0110BE, NABelow or equal, Not above(CF | OF) == 1Unsigned
    0111NBE, ANot below or equal, Above(CF & ZF) == 1Unsigned
    1000SSign (MSB set)SF == 1Neither
    1001NSNo sign (MSB cleared)SF == 0Neither
    1010P, PEParity, Parity evenPF == 1Neither
    1011NP, PONo parity, Parity oddPF == 0Neither
    1100L, NGELess than, Not greater than or equal toSF != OFSigned
    1101NL, GENot less than, Greater than or equal toSF == OFSigned
    1110LE, NGLess than or equal to, Not greater thanZF == 1 || SF != OFSigned
    1111NLE, GNot less than or equal to, Greater thanZF == 0 && SF == OFSigned
  • w (wide): Determines if an operation is on 8 bits of the default operand width. This can have one of two values:
    ValueOperand Size when Operand Size Attribute is 16 bitsOperand Size when Operand Size Attribute is 32 bits
    08 bits8 bits
    116 bits32 bits
  • xmmreg (vector register): There are 32 vector registers (only eight are accessible in Protected Mode). This field represents the three least significant bits of the register number.

Description

The "Description" section, as the name implies, contains a simplified description of the instruction's operation. In some cases, graphics will be used for illustrative purposes.

Operation

The "Operation" section is pseudo-code that uses a Rust-like syntax. While attempts are made to mimic Rust's syntax, some things are "incorrect". For example, Rust's ranges follow other programming languages with a "start to end" order. This mimics how arrays are laid out in memory (index 0 is at a lower address than index n), however, a string of bits follows positional notation with the most significant bit (MSB) at the left. Due to this, bit position slices use a "high to low" ("end to start") order.

MODE

The MODE global variable represents the current operating mode of the processor thread. It can be one of: 16, 32, or 64, each representing the "bit width" of the current mode. However, it is only compared against 64 for instructions that are illegal in long (64 bit) mode.

PROCESSOR

In some rare cases, the operation of an instruction depends on which processor version is being used. In those (known) instances, the PROCESSOR global variable represents the current processor. For example, the AAA instruction operates slightly differently on the 80186 and prior.

Registers

Registers are accessed as if they were global variables. Any aliasing, and the zero extension to RrX when setting ErX, is handled implicitly.

Flags

Flags are accessed as if they were global variables. For example, OF would refer to the overflow flag (which is either a zero or a one). These single bit values, when used in if conditions, are implicitly coerced to a boolean. The only multibit flag, IOPL, is a two bit value and, as such, cannot be coerced.

Instruction Bits

Instruction prefixes are exposed as pseudo global variables. For example, EVEX.b refers to the b (broadcast) bit in the EVEX prefix for the currently executing instruction.

Types

Simd<T>

The most used type in the pseudo-code is the Simd<T> type. It represents an x86 vector register. Currently, Simd::max() is 512 to correspond with the ZMM registers, but this will change if an "AVX-1024" were to be created.

The T generic is a numeric type (integer or floating point) that represents what the ZMM register contains. For example, Simd<f64> represents a ZMM register containing eight "double precision" floating point (64 bit) numbers.

Operations on Simd<T> are at the "bit level". In other words, even though T represents the type of data, data[0] does not represent the first data value, but the first bit. For example, to access the second data value in a Simd<u32>, data[63..=32] would be used.

KMask

In addition to the Simd<T> type for vector instructions, there also exists the KMask type. It represents an x86 mask register (k0 through k7). KMask is a 64 bit wide bit addressable type. Each bit corresponds to the same bit in the x86 mask register with k[n] referring to the "n-th" bit of the underlying mask register.

Examples

The "Examples" section (if present) contains one or more example assembly snippets that demonstrate the instruction. Any examples provided use NASM (Intel) syntax.

Flags Affected

The "Flags Affected" section (if present) contains a description of how the processor flags are affected by the instruction.

Intrinsics

The "Intrinsics" section(s) (if present) contain C or Rust function definitions that can be used in one's code to utilize the instruction without inline assembly.

Exceptions

The "Exceptions" sections contain a list of possible processor exceptions that can result from execution of the instruction. For regular (non-vector) instructions, each subsection will be for the various processor modes. Vector instructions, on the other hand, will typically only have two subsections: "SIMD Floating-Point" and "Other".