In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that element in the multiset. As a consequence, an infinite number of multisets exist that contain only elements and , but vary in the multiplicities of their elements:

  • The set contains only elements and , each having multiplicity 1 when is seen as a multiset.
  • In the multiset , the element has multiplicity 2, and has multiplicity 1.
  • In the multiset , and both have multiplicity 3.

These objects are all different when viewed as multisets, although they are the same set, since they all consist of the same elements. As with sets, and in contrast to tuples, the order in which elements are listed does not matter in discriminating multisets, so and denote the same multiset. To distinguish between sets and multisets, a notation that incorporates square brackets is sometimes used: the multiset can be denoted by .

The cardinality or "size" of a multiset is the sum of the multiplicities of all its elements. For example, in the multiset the multiplicities of the members , , and are respectively 2, 3, and 1, and therefore the cardinality of this multiset is 6.

Nicolaas Govert de Bruijn coined the word multiset in the 1970s, according to Donald Knuth. However, the concept of multisets predates the coinage of the word multiset by many centuries. Knuth himself attributes the first study of multisets to the Indian mathematician Bhāskarāchārya, who described permutations of multisets around 1150. Other names have been proposed or used for this concept, including list, bunch, bag, heap, sample, weighted set, collection, and suite. These and similar collections of objects can be regarded as multisets, because strokes, tally marks, or units are considered indistinguishable. This shows that people implicitly used multisets even before mathematics emerged.

Practical needs for this structure have caused multisets to be rediscovered several times, appearing in literature under different names. For instance, they were important in early AI languages, such as QA4, where they were referred to as bags, a term attributed to Peter Deutsch. A multiset has been also called an aggregate, heap, bunch, sample, weighted set, occurrence set, and fireset (finitely repeated element set).

Although multisets were used implicitly from ancient times, their explicit exploration happened much later. The first known study of multisets is attributed to the Indian mathematician Bhāskarāchārya circa 1150, who described permutations of multisets. Athanasius Kircher found the number of multiset permutations when one element can be repeated. Jean Prestet published a general rule for multiset permutations in 1675. John Wallis explained this rule in more detail in 1685.

Multisets appeared explicitly in the work of Richard Dedekind.

Other mathematicians formalized multisets and began to study them as precise mathematical structures in the 20th century. For example, Hassler Whitney (1933) described generalized sets ("sets" whose characteristic functions may take any integer value: positive, negative or zero). Monro (1987) investigated the category Mul of multisets and their morphisms, defining a multiset as a set with an equivalence relation between elements "of the same sort", and a morphism between multisets as a function that respects sorts. He also introduced a multinumber: a function f(x) from a multiset to the natural numbers, giving the multiplicity of element x in the multiset. Monro argued that the concepts of multiset and multinumber are often mixed indiscriminately, though both are useful.

Examples

One of the simplest and most natural examples is the multiset of prime factors of a natural number . Here the underlying set of elements is the set of prime factors of . For example, the number 120 has the prime factorization

<math display="block">120 = 2^3 3^1 5^1,</math>

which gives the multiset .

A related example is the multiset of solutions of an algebraic equation. A quadratic equation, for example, has two solutions. However, in some cases they are both the same number. Thus the multiset of solutions of the equation could be , or it could be . In the latter case it has a solution of multiplicity 2. More generally, the fundamental theorem of algebra asserts that the complex solutions of a polynomial equation of degree always form a multiset of cardinality .

A special case of the above are the eigenvalues of a matrix, whose multiplicity is usually defined as their multiplicity as roots of the characteristic polynomial. However two other multiplicities are naturally defined for eigenvalues, their multiplicities as roots of the minimal polynomial, and the geometric multiplicity, which is defined as the dimension of the kernel of (where is an eigenvalue of the matrix ). These three multiplicities define three multisets of eigenvalues, which may be all different: Let be a matrix in Jordan normal form that has a single eigenvalue. Its multiplicity is , its multiplicity as a root of the minimal polynomial is the size of the largest Jordan block, and its geometric multiplicity is the number of Jordan blocks.

Definition

A multiset may be formally defined as an ordered pair where is a set called a universe or the underlying set, and <math>m\colon U \to \N</math> is a function from to the nonnegative integers. define multisets with the additional constraint that for every , or, equivalently, the support equals the underlying set.

Multisets with infinite multiplicities have also been studied; they are not considered in this article.

Some authors define a multiset in terms of a finite index set and a function where the multiplicity of an element is , the number of elements of that get mapped to by .

Multisets may be represented as sets, with some elements repeated. For example, the multiset with support and multiplicity function such that can be represented as . A more compact notation, in case of high multiplicities is for the same multiset.

If <math>A=\{a_1, \ldots, a_n\},</math> a multiset with support included in is often represented as

<math display=block>a_1^{m(a_1)} \cdots a_n^{m(a_n)},</math>

to which the computation rules of indeterminates can be applied; that is, exponents 1 and factors with exponent 0 can be removed, and the multiset does not depend on the order of the factors. This allows extending the notation to infinite underlying sets as

<math display=block>\prod_{a\in U} a^{m(a)}.</math>

An advantage of notation is that it allows using the notation without knowing the exact support. For example, the prime factors of a natural number form a multiset such that

<math display=block>n=\prod_{p \;\text {prime p^{m(p)}= 2^{m(2)} 3^{m(3)} 5^{m(5)}\cdots.</math>

Basic properties and operations

Elements of a multiset are generally taken in a fixed set , sometimes called a universe, which is often the set of natural numbers. An element of that does not belong to a given multiset is said to have a multiplicity 0 in this multiset. This extends the multiplicity function of the multiset to a function from to the set <math>\N</math> of non-negative integers. This defines a one-to-one correspondence between these functions and the multisets that have their elements in .

This extended multiplicity function is commonly called simply the multiplicity function, and suffices for defining multisets when the universe containing the elements has been fixed. This multiplicity function is a generalization of the indicator function of a subset, and shares some properties with it.

The support of a multiset <math>A</math> in a universe is the underlying set of the multiset, or <math>\operatorname{Supp}(A)</math>. Using the multiplicity function <math>m</math>, it is characterized as

<math display="block">\operatorname{Supp}(A) := \{x \in U \mid m_{A}(x) > 0 \}.</math>

A multiset is finite if its support is finite, or, equivalently, if its cardinality

<math display="block">|A| = \sum_{x\in\operatorname{Supp}(A)} m_A(x) = \sum_{x\in U} m_A(x)</math>

is finite. The empty multiset is the unique multiset with an empty support (underlying set), and thus a cardinality 0.

The usual operations of sets may be extended to multisets by using the multiplicity function, in a similar way to using the indicator function for subsets. In the following, and are multisets in a given universe , with multiplicity functions <math>m_A</math> and <math>m_B.</math>

  • Inclusion: is included in , denoted , if <math display="block">m_A(x) \le m_B(x)\quad\forall x\in U.</math>
  • Union: the union (called, in some contexts, the maximum or lowest common multiple) of and is the multiset with multiplicity function Multisets have become an important tool in the theory of relational databases, which often uses the synonym bag. For instance, multisets are often used to implement relations in database systems. In particular, a table (without a primary key) works as a multiset, because it can have multiple identical records. Similarly, SQL operates on multisets and returns identical records. For instance, consider "SELECT name FROM Student". In the case that there are multiple records with name "Sara" in the student table, all of them are shown. That means the result of an SQL query is a multiset; if the result were instead a set, the repetitive records in the result set would have been eliminated. Another application of multisets is in modeling multigraphs. In multigraphs there can be multiple edges between any two given vertices. As such, the entity that specifies the edges is a multiset, and not a set.

There are also other applications. For instance, Richard Rado used multisets as a device to investigate the properties of families of sets. He wrote, "The notion of a set takes no account of multiple occurrence of any one of its members, and yet it is just this kind of information that is frequently of importance. We need only think of the set of roots of a polynomial f(x) or the spectrum of a linear operator."

  • Real-valued multisets (in which multiplicity of an element can be any real number)
  • Fuzzy multisets
  • Rough multisets
  • Hybrid sets
  • Multisets whose multiplicity is any real-valued step function
  • Soft multisets
  • Soft fuzzy multisets
  • Named sets (unification of all generalizations of sets)

See also

  • Frequency (statistics) as multiplicity analog
  • Quasi-sets
  • Set theory
  • Bag-of-words model

Notes

References