First normal form (1NF) is the most basic level of database normalization defined by English computer scientist Edgar F. Codd, the inventor of the relational database. A relation (or a table, in SQL) can be said to be in first normal form if each field is atomic, containing a single value rather than a set of values or a nested table. In other words, a relation complies with first normal form if no attribute domain (the set of values allowed in a given column) has relations as elements.

Most relational database management systems, including standard SQL, do not support creating or using table-valued columns, which means most relational databases will be in first normal form by necessity. Otherwise, normalization to 1NF involves eliminating nested relations by breaking them up into separate relations associated with each other using foreign keys. This process is a necessary step when moving data from a non-relational (or NoSQL) database, such as one using a hierarchical or document-oriented model, to a relational database.

A database must satisfy 1NF to satisfy further "normal forms", such as 2NF and 3NF, which enable the reduction of redundancy and anomalies. Other benefits of adopting 1NF include the introduction of increased data independence and flexibility (including features like many-to-many relationships) and simplification of the relational algebra and query language necessary to describe operations on the database.

Codd considered 1NF mandatory for relational databases, while the other normal forms were merely guidelines for database design.

Background

First normal form was introduced in 1970 by Edgar F. Codd in his paper "A relational model of data for large shared data banks", although initially it was simply referred to as "normalization" or "normal form". It was renamed to "first normal form" when Codd introduced additional normal forms in his paper "Further Normalization of the Data Base Relational Model" in 1971.

The relational model was proposed as an improvement over hierarchical databases which were prevalent at the time. A key difference lies in how relationships between records are represented. In a hierarchical database, one-to-many relationships are represented through containment: a single record may contain sets of records (known as repeating groups) as attribute values. But Codd argued that hierarchy is not flexible and expressive enough for more complex data models. For example, many-to-many relationships cannot be represented through hierarchy. Thus he suggest eliminating nested records and instead represent relationship through foreign keys. This allows richer relationships to be expressed, since a record can now participate in multiple relationships.

A direct translation of a hierarchical database into relations would represent repeating groups as nested relations. Thus normalization is defined as eliminating nested relations and instead represent the one-to-many relationship through foreign keys.

Codd distinguishes between "atomic" and "compound" data. Atomic (or "nondecomposable") data includes basic types such as numbers and strings – broadly speaking, it "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)". Compound data is made up of structures such as relations (or tables, in SQL) which contain several pieces of atomic data and thus "can be decomposed by the DBMS".

In a relation, each attribute (or column) has a set of allowed values known as its domain (e.g., a "Price" attribute's domain may be the set of non-negative numbers with up to 2 fractional digits). Each tuple (or row) in the relation contains one value per attribute, and each must be an element in that attribute's domain. Codd distinguishes attributes which have "simple domains" containing only atomic data from attributes with "nonsimple domains" containing at least some forms of compound data. Nonsimple domains introduce a degree of structural complexity which can be difficult to navigate, to query and to update – for instance, it will be time-consuming to operate across several nested relations (that is, tables containing further tables), which can be found in some non-relational databases.

First normal form therefore requires all attribute domains to be simple domains, such that the data in each field is atomic and no relation has relation-valued attributes. Precisely, Codd states that, in the relational model, "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS." In particular, the notion of an atomic value as a "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:

  • A string would seem not to be atomic, as an RDBMS typically provides operators to decompose it into substrings.
  • A fixed-point number would seem not to be atomic, as an RDBMS typically provides operators to decompose it into integer and fractional components.
  • An ISBN would seem not to be atomic, as it includes various parts, including the registration group, registrant and publication elements.

Date suggests that "the notion of atomicity has no absolute meaning": a value may be considered atomic for some purposes, but may be considered an assemblage of more basic elements for other purposes. If this position is accepted, 1NF cannot be defined with reference to atomicity. Columns containing any conceivable data type (from strings and numeric types to arrays and tables) are then acceptable in a 1NF table, although perhaps not always desirable – for example, it may be desirable to separate a CustomerName column into two columns, FirstName and Surname.

Christopher J. Date's definition of 1NF

According to Christopher J. Date's definition, a table is in first normal form if and only if it is "isomorphic to some relation", which means, specifically, that it satisfies the following five conditions:

  1. There is no specific top-to-bottom ordering of the rows.
  2. There is no specific left-to-right ordering of the columns.
  3. There are no duplicate rows.
  4. Every field (or intersection of a row and a column) contains exactly one value from the applicable domain and nothing else.
  5. All columns are regular (i.e., rows have no hidden components such as row IDs, object IDs, or hidden timestamps).

Violation of any of these conditions would mean that the table is not strictly relational, and therefore that it is not in first normal form.

This definition of 1NF permits relation-valued attributes (tables within tables), which Date argues are useful in rare cases. Examples of tables (or views) that would not meet this definition of first normal form are:

  • A table that lacks a unique key constraint. Such a table would be able to accommodate duplicate rows, in violation of condition 3.
  • A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view, in violation of condition 1. The tuples in true relations are not ordered with respect to each other (such views cannot be created using SQL that conforms to the SQL:2003 standard).
  • A table with at least one nullable attribute. A nullable attribute would be in violation of condition 4, which requires every column to contain exactly one value from its column's domain. This aspect of condition 4 is controversial; it marks an important departure from Codd's later vision of the relational model, which made explicit provision for nulls.

See also

  • Attribute–value system
  • Second normal form (2NF)
  • Third normal form (3NF)
  • Boyce–Codd normal form (BCNF or 3.5NF)
  • Fourth normal form (4NF)
  • Fifth normal form (5NF)
  • Sixth normal form (6NF)

References

Further reading

  • Date, C. J., & Lorentzos, N., & Darwen, H. (2002). Temporal Data & the Relational Model (1st ed.). Morgan Kaufmann. .
  • Date, C. J. (1999), An Introduction to Database Systems (8th ed.). Addison-Wesley Longman. .
  • Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM, vol. 26, p. 120–125.

de:Normalisierung (Datenbank)#Erste Normalform (1NF)

pl:Postać normalna (bazy danych)