A universally unique identifier (UUID) is a 128-bit number used to identify information in computer systems. The term globally unique identifier (GUID) is also used, typically in software created by Microsoft. Thus, anyone can create large numbers of UUIDs and use them as identifiers with near certainty that they do not duplicate UUIDs that have been, or will be, created by others, with the only coordination required to achieve uniqueness being conformance with the UUID standards. Information labeled with UUIDs by independent parties can therefore coexist in the same databases or channels, with a negligible probability of duplication.

Adoption of UUIDs is widespread, with many computing platforms providing support for generating them and for parsing their textual representation.

History

Apollo Computer used UUIDs in the Network Computing System (NCS), launched in 1987, with a design inspired by the 64-bit unique identifiers of Domain/OS, an earlier Apollo operating system. Microsoft Windows platforms adopted the NCS (and later, the DCE) design as "Globally Unique Identifiers" (GUIDs) in the early nineties.

Somewhat later, the Open Software Foundation (OSF) used UUIDs in its Distributed Computing Environment (DCE), with a design partly based on the NCS UUIDs. This was documented in the DCE 1.1 RPC specification in 1996, and in the DCE 1.1 Authentication and Security Services specification, published in 1997.

ISO/IEC documented the DCE design in 1996, in ISO/IEC 11578:1996 "Information technology – Open Systems Interconnection – Remote Procedure Call".

In July 2005, the Internet Engineering Task Force (IETF) published the Standards-Track RFC 4122., which also registered a URN namespace for UUIDs. The ITU had meanwhile also standardized UUIDs, based on the previous standards and early versions of RFC 4122, in ITU-T Rec. X.667 ISO/IEC 9834-8. This was technically equivalent to RFC 4122.

The current IETF specification is RFC 9562, a Proposed Standard published in May 2024. This defined three new UUID versions (6–8) of the DCE variant. The UUIDs currently in use are the DCE/IETF design, with provision for backwards compatibility with "legacy" Apollo NCS UUIDs and Microsoft GUIDs.

The authors of RFC 4122 were Paul Leach, Michael Mealling, and Rich Salz, and the authors of the initial 1997 Internet Draft were Leach and Salz. Leach had been the Architect of Domain/OS, had continued at Apollo as a designer of NCS, and then, as a Microsoft Distinguished Architect, contributed to the design of OLE/COM/DCOM, carrying the concept of UUIDs to that project.Leach was also one of the authors of RFC 9562. Salz was a member of the DCE team at the Open Software Foundation. Apollo had merged in 1989 with Hewlett-Packard, a founding member of the OSF. Former NCS team-members, having become HP employees, brought UUIDs to OSF DCE. Mealling was a prominent IETF member, holding a seat on the Internet Engineering Steering Group, and was heavily involved in IETF work on URNs. RFC 4122 brought all these strands together.

Format

A UUID is a 128-bit number. The meaning of the bits is determined by the variant, of which three are defined. Of these, the most common is variant 1, with the other variants being for backwards compatibility with previous formats or for future definition. Variants 1 and 2 have "versions", which further define the interpretation of the UUID.

Variants

The variant field is in a variable number of the most-significant bits of the ninth byte. It indicates the format of the UUID. The following variants are defined:

  • Variant 0 (indicated by the one-bit pattern 0xxx<sub>2</sub>) is for backwards compatibility with the now-obsolete Apollo Network Computing System 1.5 UUID format developed around 1987. The variant field of current UUIDs overlaps the address family octet in NCS UUIDs in such a way that any NCS UUIDs still in use have a 0 in the first bit of the variant field. The variant 0 numbering space also includes the Nil UUID.
  • Variant 1 (10<sub>2</sub>) UUIDs are referred to as <nowiki>RFC 4122/DCE 1.1 UUIDs</nowiki>, or "Leach–Salz" UUIDs, after the authors of the original Internet Draft. These are the UUIDs in current use.
  • Variant 2 (110<sub>2</sub>) is for backwards compatibility with the "GUIDs" used in Microsoft COM/DCOM. This format was used in early GUIDs on the Microsoft Windows platform. Current Microsoft tools generate variant 1 UUIDs, not this variant. The main difference between this variant and variant 1, aside from the extra variant bit, is byte-ordering within the UUID. RFC 9562 declared this variant out of scope, so the three new versions defined for variant 1 do not apply to variant 2.
  • Variant 3 (111<sub>2</sub>) includes the Max UUID, but is otherwise undefined and reserved for future use.

Versions

The OSF DCE and Microsoft COM/DCOM variants (1 & 2, respectively) have versions, indicated by the value of the high four bits of the seventh byte of the UUID. In textual representations of the UUID, this is the hex digit after the second hyphen. Variant 0 Apollo NCS UUIDs do not have versions, being sub-typed via "address families" rather than versions. RFC 9562, which defined versions 6, 7, and 8 stated that variants other than the OSF DCE variant 1 were "out of scope" of the RFC, leaving it to Microsoft to define new versions for variant 2. However, versions 1 through 5, standardized in RFC 4122, are the same in the Microsoft variant, except for byte-ordering.

{| class="wikitable sortable" style="width: 100%;"

|+Comparison of UUID versions

! Version !! Type !! Time source !! Entropy/ID source !! Best use case

|-

| 1 || Time-based || Gregorian (100ns) || MAC address & clock seq || Legacy systems; distributed uniqueness

|-

| 2 || DCE Security || Gregorian (low-res) || Local ID (UID/GID) & node || DCE-based security environments [Legacy]

|-

| 3 || Name-based (MD5) || None (deterministic) || Namespace & name || Deterministic IDs; legacy name-hashing

|-

| 4 || Random || None || Cryptographic random || General purpose; maximum privacy

|-

| 5 || Name-based (SHA-1) || None (deterministic) || Namespace & name || Deterministic IDs; preferred over version 3

|-

| 6 || Time-ordered || Gregorian (100ns) || MAC address or random || Database keys; reordered version 1

|-

| 7 || Time-ordered || Unix epoch (ms) || Cryptographic random || Modern database keys; high locality

|-

| 8 || Custom || Variable || Implementation-defined || Experimental or application-specific layouts

|}

Versions 1 and 6 (date–time and MAC address)

Version 1 concatenates the 48-bit MAC address of the "node" (that is, the computer generating the UUID), with a 60-bit timestamp. On systems with 64-bit EUI-64 "MAC addresses", the least significant 48 bits are used. A 48-bit random number may also be used.

The timestamp is the number of 100-nanosecond intervals since midnight 15 October 1582 Coordinated Universal Time (UTC), the date on which the Gregorian calendar was first adopted. <nowiki>RFC 4122</nowiki> states that the time value rolls over in A.D. 3409,

Versions 3 and 5 (namespace name-based)

Version-3 and version-5 UUIDs are generated by hashing a namespace identifier and name. Version 3 uses MD5 as the hashing algorithm, and version 5 uses SHA-1. The uniqueness of the UUIDs based on network-card MAC addresses also depends on network-card manufacturers properly assigning unique MAC addresses to their cards which, like other manufacturing processes, is subject to error. MAC addresses also may come from sources other than network cards. For example, virtual machines receive a MAC address from a range that is configurable in the hypervisor, and some operating systems permit the end user to customise the MAC address, notably OpenWrt. When a device has an EUI-64 64-bit "MAC address", using the least significant 48 bits of it, as recommended by the RFC, may result in the node ID part of the UUID being duplicated. Thus, node IDs based on MAC addresses may not be globally unique.

Usage of the node's network card MAC address for the node ID does often mean that version-1, -2, and -6 UUIDs can be tracked back to the computer that created them. Such UUIDs can be used to infer what kind of hardware is being used to generate the UUIDs. Documents can sometimes be traced to the computers where they were created or edited through UUIDs embedded into them by word processing software. This privacy hole was used when locating the creator of the Melissa virus.

RFC 9562 does allow the MAC address in a version-1, -2 or -6 UUID to be replaced by a random 48-bit node ID, either because the node does not have a MAC address, or because it is not desirable to include it. In that case, the RFC requires that the least significant bit of the first octet of the node ID should be set to 1.

Consequently, the lexical sortability of versions 6 and 7 UUIDs is most consistent when generated by the same library or system. Mixing UUIDs from different sources, or interleaving different versions and variants in a single database, can degrade the time-ordering and index locality that these versions were designed to provide.

RFC 9562 does not mandate a minimum number of random bits for version 7; it allows the sub-millisecond precision and monotonicity counter fields to occupy the remainder of the 128-bit structure. Consequently, collision resistance and guessability depend entirely on the specific implementation's allocation of these bits.

Special values

The Nil UUID is <code>00000000-0000-0000-0000-000000000000</code> (that is, all clear bits), which can be useful to express the concept of "no such value".

{| class="wikitable" style="width: 100%;"

|+ Apollo NCS UUID Record Layout

! Field !! Width (bits) !! Description

|-

| time_high || 32 || High-order 32 bits of the 64-bit 4-microsecond timestamp, origin=Jan 1, 1980

|-

| time_low || 16 || Low-order 16 bits of the 64-bit 4-microsecond timestamp

|-

| reserved || 16 || Reserved field (often 0)

|-

| family || 8 || Address family (e.g., 0x00 for unspecified, 0x02 for IP, 0x0D for DDS)

|-

| host || 56 || 56-bit host identifier (network address)

|}

RFC 4122 incorporated legacy NCS UUIDs as "variant 0" of the new format by overlapping the variant bits of the new format with the NCS Address Family field. Since the highest address family value defined in NCS is 13 (hex 00 to 0D), the most significant bit in the variant octet is always 0 for extant NCS UUIDs, while this bit is 1 in the newer three IETF variants. In effect, NCS address families 0–127 became the new "variant 0", while the numbering space of NCS address families 128–255, which was undefined in NCS, was rededicated to variant-1-to-3 IETF and Microsoft UUIDs. The result was that legacy NCS UUIDs and variant-1-to-3 UUIDs were separated and could coexist in the same databases, and on the same communication channels.

{| class="wikitable"

|+ Family / variant field

! MSB 0

! MSB 1

! MSB 2

! Family (octet)

! Variant

! Description

|-

| 0

| x

| x

| x00-0x7F

| 0

| Reserved. Apollo NCS backward compatibility, plus Nil. Subtyped by address family.

|-

| 1

| 0

| x

| x80-xBF

| 1

| OSF DCE UUID. Subtyped by versions (1-8).

|-

| 1

| 1

| 0

| xC0-xDF

| 2

| Reserved. Microsoft backward compatibility. Subtyped by versions (1-5).

|-

| 1

| 1

| 1

| xE0-xFF

| 3

| Reserved (Future, plus Max).

|}

The legacy Apollo NCS UUID has the format described in the previous table. The OSF DCE UUID variant is described in RFC 9562. The Microsoft COM / DCOM UUID has its variant described in the Microsoft documentation, but for versions 1 to 5, is generally the same as the DCE variant, except for the extra variant bit and byte-ordering. (See next section.) Variant 2 does not have versions 6 to 8.

{| class="wikitable" style="width: 100%;"

! Field !! Width (bits) !! Description

|+ RFCs 4122 / 9562 Variant 1-3 Layout

|-

| data_a || 32 || First 32 bits of the timestamp or data

|-

| data_b || 16 || Second 16 bits of the timestamp or data

|-

| version || 4 || The version number (bits 48 through 51)

|-

| data_c || 12 || Third 12 bits of the timestamp or data (bits 52 through 63)

|-

| variant || 2-3 || The RFC 9562 Variant bits (10x, 110, 111, bits 64-66)

|-

| data_d || 13-14 || The clock sequence or other data (bits 66 or 67 through 79)

|-

| data_e || 48 || The 48-bit node ID or other data (bits 80 through 127)

|}

The interpretation of the various bit-blocks varies according to the variant 1-3 "version". In variant 2, data_a, data_b, version|data_c are byte-swapped and variant|data_d and data_e are not byte-swapped.

Byte ordering

Variant 1 UUIDs are sequentially encoded in big-endian. For example, <code>00112233-4455-6677-8899-aabbccddeeff</code> is encoded as the bytes <code>00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff</code>.

In contrast, variant 2 UUIDs ("GUIDs"): historically used in Microsoft COM/OLE libraries, have a mixed-endian format, with the first three fields (corresponding to version-1 timestamp subfields) being little-endian, while the final two fields are emitted as big-endian arrays of bytes. The example UUID above, if it were a variant 2 UUID, would be encoded on the wire as <code>33 22 11 00 55 44 77 66 88 99 aa bb cc dd ee ff</code>. All versions under variant 2 are emitted with this byte ordering, including versions not containing numeric fields, such as 3,4, and 5.

Textual representation

In most cases, UUIDs are represented as hexadecimal values separated by hyphens. Most used is the 8-4-4-4-12 format, a string of 32 hexadecimal digits with four hyphens, <code>xxxxxxxx-xxxx-vxxx-wxxx-xxxxxxxxxxxx</code>. The hyphens separate the version-1 fields but the same format is commonly used for all versions. Every hexadecimal digit represents 4 bits; <code>v</code> represents the version nibble; and the high-order one to three bits of <code>w</code> are the variant. The Windows registry format is the same but wraps the UUID in <code>{}</code> braces. The byte-ordering differences of variant 2 are applicable in binary storage or transmission on the wire, and do not affect the textual presentation of the UUID.

Though they are still occasionally omitted, the format with hyphens was introduced with the newer variant system. Before that, the legacy Apollo format used a slightly different format <code>34dc23469000.0d.00.00.7c.5f.00.00.00</code>. The first part is the time (time_high and time_low combined). The reserved field is skipped. The family field comes directly after the first dot, so in this case <code>0d</code> (13 in decimal) for DDS (Data Distribution Service). The remaining parts, each separated with a dot, are the node bytes.

Lowercase hexadecimal digits are preferred. ITU-T Rec. X.667 requires lowercase on generation, but also requires the uppercase version to be accepted on input. Since UUIDs are 128-bit numbers, other formats are possible, and occasionally seen, such as decimal digits or binary.

RFC 4122 registers the "uuid" namespace for URNs. This makes it possible to form URNs from UUIDs, like <code><nowiki>urn:uuid:550e8400-e29b-41d4-a716-446655440000</nowiki></code>. The normal 8-4-4-4-12 format is used for this.

It is also possible to make an OID out of a UUID, which in turn provides another way to make a URN from it. The OID for the previous example is <code>2.25.113059749145936325402354257176981405696</code>. The unsigned decimal form of the UUID is prefixed with <code>2.25</code>, which represents the <code>{joint-iso-itu-t(2) uuid(25)}</code> "arc" within the OID namespace. This may be further prefixed with <code><nowiki>urn:oid:</nowiki></code> to make a second form of URN for UUIDs. In general, the <code>uuid</code> URN is recommended over the <code>oid</code> URN.

Collisions

A collision occurs when the same UUID is generated more than once and is assigned to different referents. In the case of many standard version-1, -2, or -6 UUIDs using unique MAC addresses and/or timestamps, collisions can occur only as a result of error, such as manufacturing problems, skewed clocks, or software bugs.

Duplicate UUIDs can also occur due to error with the UUID versions generated using processes such as random number generation or hashing. It is important, for example, to have a high-entropy source of randomness when generating version-4 UUIDs. But collisions can also occur without error with such UUIDs, due to chance -- "bad luck".

The probability of this is normally so small that it can be ignored, and can be computed precisely based on analysis of the birthday problem. For example, the number of random version-4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71&nbsp;quintillion, computed as follows:

\approx 2.71 \times 10^{18}.</math>

This number would be equivalent to generating 1&nbsp;billion UUIDs per second for about 86&nbsp;years. A file containing this many UUIDs, at 16 bytes per UUID, would be about 43.4&nbsp;exabytes (37.7&nbsp;EiB) -- a file, listing only identifiers, a few orders of magnitude larger than the largest databases currently in existence, which are on the order of 100 PB (e.g. Google's web indexes). If generated according to the standards, duplicate UUIDs are more likely to be the result of bit-flips (a so called Single-event upset) caused by cosmic rays passing through memory or disk storage, than the result of mischance at UUID-generation time.

The smallest number of version-4 UUIDs which must be generated for the probability of finding of at least one collision to be p is approximated by the formula

.</math>

Thus, the probability to find a duplicate within 103&nbsp;trillion properly-generated version-4 UUIDs is one in a billion.

Uses

Filesystems

Several filesystem types (for example, ext4 and Btrfs) use a UUID to uniquely identify each filesystem to the operating system. (NTFS and FAT32 do not, utilising a shorter UID (Unique identifier) instead.)

Filesystem userspace tools, most of which are derived from the original implementation by Theodore Ts'o, therefore make use of UUIDs.

An <code>/etc/fstab</code> file might assign mount points based on these UUIDs (or a UID for a FAT32 EFI system partition (ESP)):

<syntaxhighlight lang="sh">

  1. device-uuid mount-point fs-type options dump pass

UUID=b18e3b6c-ccb7-4308-b527-35e5e6ee2145 / btrfs defaults 0 0

UUID=103C-86D6 /efi vfat utf8 0 2

UUID=64f3cb6a-e70e-45e5-8b90-d86cddbab7bb swap swap defaults 0 0

UUID=eda746c6-1f1b-4cf1-9225-d8b0b46511cc /mnt/Stuff btrfs defaults 0 0

</syntaxhighlight>

Partition tables

The GUID Partition Table (GPT) uses UUIDs (called there "GUID"s) to identify partitions and partition types. Unique partition IDs are assigned locally by the operating system. Partition type IDs are well-known numbers, usually assigned by operating-system or hardware vendors.

Microsoft COM

There are several flavors of GUIDs used in Microsoft's Component Object Model (COM):

  • – interface identifier; (The ones that are registered on a system are stored in the Windows Registry at )
  • – class identifier; (Stored at ). In practice it is not entirely separate from the space, because remoting the interface can require a proxy/stub object which some toolsets used to create with a equal to the interface's .
  • – type library identifier; (Stored at )
  • – category identifier; (its presence on a class identifies it as belonging to certain class categories, listed at )

Databases

UUIDs are commonly used as a unique key in database tables. The function in Microsoft SQL Server version 4 Transact-SQL returns standard random version-4 UUIDs, while the function returns 128-bit identifiers similar to UUIDs which are committed to ascend in sequence until the next system reboot. The Oracle Database function does not return a standard GUID, despite the name. Instead, it returns a 16-byte 128-bit RAW value based on a host identifier and a process or thread identifier, somewhat similar to a GUID. PostgreSQL contains a datatype and can generate most versions of UUIDs through the use of functions from modules. MySQL provides a function which generates standard version-1 UUIDs.

The random nature of standard UUIDs of versions 3, 4, and 5, and the ordering of the fields within standard versions 1 and 2 may create problems with database locality or performance when UUIDs are used as primary keys. For example, in 2002 Jimmy Nilsson reported a significant improvement in performance with Microsoft SQL Server when the version-4 UUIDs being used as keys were modified to include a non-random suffix based on system time. By reordering and encoding version-1 and -2 UUIDs so that the timestamp comes first, insertion performance loss can be averted. This is the rationale for variant 1 (DCE) versions 6 and 7, standardized in RFC 9562.

Other examples

UEFI and ACPI are examples that use GUID.

See also

  • Birthday attack
  • Object identifier (OID)
  • Uniform Resource Identifier (URI)
  • Snowflake ID

References

  • Recommendation ITU-T X.667 (Free access)
  • ISO/IEC 9834-8:2014 (Paid)
  • Technical Note TN2166 - Secrets of the GPT - Apple Developer
  • UUID Documentation - Apache Commons Id
  • CLSID Key - Microsoft Docs
  • Universal Unique Identifier - The Open Group Library
  • UUID Decoder tool
  • A Brief History of the UUID
  • Understanding How UUIDs Are Generated