In computer science, self-modifying code (SMC or SMoC) is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance. The term is usually only applied to code where the self-modification is intentional, not in situations where code accidentally modifies itself due to an error such as a buffer overflow.

Self-modifying code can involve overwriting existing instructions or generating new code at run time and transferring control to that code.

Self-modification can be used as an alternative to the method of "flag setting" and conditional program branching, used primarily to reduce the number of times a condition needs to be tested.

The method is frequently used for conditionally invoking test/debugging code without requiring additional computational overhead for every input/output cycle.

The modifications may be performed:

  • only during initialization – based on input parameters (when the process is more commonly described as software "configuration" and is somewhat analogous, in hardware terms, to setting jumpers for printed circuit boards). Alteration of program entry pointers is an equivalent indirect method of self-modification, but requiring the co-existence of one or more alternative instruction paths, increasing the program size.
  • throughout execution ("on the fly") – based on particular program states that have been reached during the execution

In either case, the modifications may be performed directly to the machine code instructions themselves, by overlaying new instructions over the existing ones (for example, altering a compare and branch to an unconditional branch or alternatively a NOP).

In the IBM System/360 architecture and its successors up to z/Architecture, an EXECUTE (EX) instruction logically overlays the second byte of its target instruction with the low-order 8 bits of register 1. This provides the effect of self-modification, although the actual instruction in storage is not altered.

Application in low and high level languages

Self-modification can be accomplished in a variety of ways depending upon the programming language and its support for pointers and/or access to dynamic compiler or interpreter "engines":

  • overlay of existing instructions (or parts of instructions such as opcode, register, flags or addresses)
  • direct creation of whole instructions or sequences of instructions in memory
  • creation or modification of source code statements followed by a "mini compile" or a dynamic interpretation (see eval statement)
  • creating an entire program dynamically and then executing it

Assembly language

Self-modifying code is quite straightforward to implement when using assembly language. Instructions can be dynamically created in memory (or else overlaid over existing code in non-protected program storage), To a lesser extent, the DR-DOS kernel also optimizes speed-critical sections of itself at loadtime depending on the underlying processor generation.

Further reading

  • (80 pages)
  • Using self-modifying code under Linux
  • Self-modifying C code
  • Certified Self-Modifying Code