thumb|upright=1.35|Machine language monitor running on a [[W65C816S microprocessor, displaying code disassembly and dumps of processor register and memory]]

In computing, machine code is data encoded and structured to control a computer's central processing unit (CPU) via its programmable interface. A computer program consists primarily of sequences of machine-code instructions.

Instruction sets differ in various ways. Instructions of a set might all be the same length or different instructions might have different lengths; they might be smaller than, the same size as, or larger than the word size of the architecture. The number of instructions may be relatively small or large. Instructions may or may not have to be aligned on particular memory boundaries, such as the architecture's word boundary. To control a computer's architectural features, machine instructions are created. Examples of features that are controlled using machine instructions:

  • segment registers
  • protected address mode
  • binary-coded decimal (BCD) arithmetic

The criteria for instruction formats include:

  • Instructions most commonly used should be shorter than instructions rarely used.

Determining the size of the address field is a choice between space and speed.

General-purpose instructions control:

  • Data movement from one place to another
  • Monadic operations that have one operand to produce a result
  • Dyadic operations that have two operands to produce a result
  • Comparisons and conditional jumps
  • Procedure calls
  • Loop control
  • Input/output

Overlapping instruction

On processor architectures with variable-length instruction sets An example of this use is the IBM System/360 family of computers and their successors.

Examples

IBM 709x

The IBM 704, 709, 704x and 709x store one instruction in each instruction word; IBM numbers the bit from the left as S, 1, ..., 35. Most instructions have one of two formats:

;Generic

:S,1-11

:12-13 Flag, ignored in some instructions

:14-17 unused

:18-20 Tag

:21-35 Y

;Index register control, other than TSX

:S,1-2 Opcode

:3-17 Decrement

:18-20 Tag

:21-35 Y

For all but the IBM 7094 and 7094 II, there are three index registers designated A, B and C; indexing with multiple 1 bits in the tag subtracts the logical or of the selected index registers and loading with multiple 1 bits in the tag loads all of the selected index registers. The 7094 and 7094 II have seven index registers, but when they are powered on they are in multiple tag mode, in which they use only the three of the index registers in a fashion compatible with earlier machines, and require a Leave Multiple Tag Mode (LMTM) instruction in order to access the other four index registers.

The effective address is normally Y-C(T), where C(T) is either 0 for a tag of 0, the logical or of the selected index registers in multiple tag mode or the selected index register if not in multiple tag mode. However, the effective address for index register control instructions is just Y.

A flag with both bits 1 selects indirect addressing; the indirect address word has both a tag and a Y field.

In addition to transfer (branch) instructions, these machines have skip instruction that conditionally skip one or two words, e.g., Compare Accumulator with Storage (CAS) does a three way compare and conditionally skips to NSI, NSI+1 or NSI+2, depending on the result.

MIPS

The MIPS architecture provides a specific example for a machine code whose instructions are always 32 bits long.

From the point of view of a process, the machine code lives in code space, a designated part of its address space. In a multi-threading environment, different threads of one process share code space along with data space, which reduces the overhead of context switching considerably as compared to process switching.

Readability

Machine code is generally considered to be not human readable, with Douglas Hofstadter comparing it to examining the atoms of a DNA molecule. However, various tools and methods support understanding machine code.

Disassembly decodes machine code to assembly language which is possible since assembly instructions can often be mapped one-to-one to machine instructions.

A decompiler converts machine code to a high-level language, but the result can be relatively obfuscated (hard to understand).

A program can be associated with debug symbols (either embedded in the native executable or in a separate file) that allow it to be mapped to external source code. A debugger reads the symbols to help a programmer interactively debug the program. Examples include:

  • The SHARE Operating System (1959) for the IBM 709, IBM 7090, and IBM 7094 computers allowed for an loadable code format named SQUOZE. SQUOZE was a compressed binary form of assembly language code and included a symbol table.
  • Modern IBM mainframe operating systems, such as z/OS, have available a symbol table named Associated data (ADATA). The table is stored in a file that can be produced by the IBM High-Level Assembler (HLASM), either as a separate SYSADATA file or as ADATA records in a Generalized object output file (GOFF).

References

</references>

Sources

Further reading