A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes, and leaves. The root may be either a leaf or a node with two or more children.

A B+ tree can be viewed as a B-tree in which each node contains only keys (not key–value pairs), and to which an additional level is added at the bottom with linked leaves.

The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context—in particular, filesystems. This is primarily because unlike binary search trees, B+ trees have very high fanout (number of pointers to child nodes in a node, Douglas Comer notes in an early survey of B-trees (which also covers B+ trees) that the B+ tree was used in IBM's VSAM data access software, and refers to an IBM published article from 1973.

Structure

Pointer structure

thumb|304x304px|B+ tree node format where K=4. (p_i represents the pointers, k_i represents the search keys).

As with other trees, B+ trees can be represented as a collection of three types of nodes: root, internal (a.k.a. interior), and leaf. In B+ trees, the following properties are maintained for these nodes:

  • If <math display="inline">k_i</math> exists in any node in a B+ tree, then exists in that node where <math>i \ge 1</math>.
  • All leaf nodes have the same number of ancestors (i.e., they are all at the same depth).

The pointer properties of nodes are summarized in the tables below:

  • : Maximum number of potential search keys for each node in a B+ tree. (this value is constant over the entire tree).
  • : The pointer at the zero-based node index .
  • : The search key at the zero-based node index .

{| class="wikitable"

|+Internal Node Pointer Structure

|

!

! colspan="3" |

|-

! | when

! exists

! and exist

! exists, and does not exist

! and do not exist

|-

! | : Points to subtree in which all search keys

| .

| .

| .

| is empty.

|}

{| class="wikitable"

|+Leaf Node Pointer Structure

! when exists

! when does not exist and <math>i \ne K</math>

!

|-

|Points to a record with a value equal to .

|Here, is empty.

|Points to the next leaf in the tree.

|}

Node bounds

The node bounds are summarized in the table below:

{| class="wikitable"

|-

! rowspan="2" | Node Type

! colspan="2" | Number of Keys

! colspan="2" | Number of Child Nodes

|-

! Min !! Max

! Min !! Max

|-

| Root Node (when it is a leaf node) || 0 ||

|0

|0

|-

| Root Node (when it is an internal node) || 1 ||

|2 This enables each leaf node to keep all of its keys sorted at all times, which then enables each internal node to construct an ordered collection of intervals representing the contiguous extent of values contained in a given leaf. Internal nodes higher in the tree can then construct their own intervals, which recursively aggregate the intervals contained in their own child internal nodes. Eventually, the root of a B+ Tree represents the whole range of values in the tree, where every internal node represents a subinterval.

For this recursive interval information to be retained, internal nodes must additionally contain <math>m - 1</math> copies of keys <math>l_i</math> for <math>i \in [1, m - 1]</math> representing the least element within the interval covered by the child with index (which may itself be an internal node, or a leaf). Where represents the actual number of children for a given internal node.

Characteristics

The order or branching factor of a B+ tree measures the capacity of interior nodes, i.e. their maximum allowed number of direct child nodes. This value is constant over the entire tree. For a -order B+ tree with levels of index:

  • The maximum number of records stored is <math>n_{\max} = b^h - b^{h-1}</math> {<math> b^{h-1}</math> is subtracted to account for the next pointers which do not point to data records but instead point to the next leaf node}
  • The minimum number of records stored is <math>n_{\min} = 2\left\lceil\tfrac{b}{2}\right\rceil^{h-1}-2\left\lceil\tfrac{b}{2}\right\rceil^{h-2}</math>
  • The minimum number of keys is <math>n_\mathrm{kmin} = 2\left\lceil\tfrac{b}{2}\right\rceil^{h-1}-1</math>
  • The maximum number of keys is <math>n_\mathrm{kmax} = b^h-1</math>
  • The space required to store the tree is <math>O(n)</math>
  • Inserting a record requires <math>O(\log_bn)</math> operations
  • Finding a record requires <math>O(\log_bn)</math> operations
  • Removing a (previously located) record requires <math>O(\log_bn)</math> operations
  • Performing a range query with k elements occurring within the range requires <math>O(\log_bn+k)</math> operations
  • The B+ tree structure expands/contracts as the number of records increases/decreases. There are no restrictions on the size of B+ trees. Thus, increasing usability of a database system.
  • Any change in structure does not affect performance due to balanced tree properties.
  • The data is stored in the leaf nodes and more branching of internal nodes helps to reduce the tree's height, thus, reduce search time. As a result, it works well in secondary storage devices.
  • Searching becomes extremely simple because all records are stored only in the leaf node and are sorted sequentially in the linked list.
  • We can retrieve range retrieval or partial retrieval using B+ tree. This is made easier and faster by traversing the tree structure. This feature makes B+ tree structure applied in many search methods.

function search(k, root) is

let leaf = leaf_search(k, root)

for leaf_key in leaf.keys():

if k = leaf_key:

return true

return false

function leaf_search(k, node) is

if node is a leaf:

return node

let p = node.children()

let l = node.left_sided_intervals()

assert <math>|p| = |l| + 1</math>

let m = p.len()

for i from 1 to m - 1:

if <math>k \le l[i]</math>:

return leaf_search(k, p[i])

return leaf_search(k, p[m])

Note that this pseudocode uses 1-based array indexing.

Insertion

  • Perform a search to determine which node the new record should go into.
  • If the node is not full (at most <math> b - 1 </math> entries after the insertion), add the record.
  • Otherwise, before inserting the new record
  • Split the node.
  • original node has <math>\lceil (K+1)/2 \rceil</math> items
  • new node has <math>\lfloor (K+1)/2 \rfloor</math> items
  • Copy <math>\lceil (K+1)/2 \rceil</math>-th key to the parent, and insert the new node to the parent.
  • Repeat until a parent is found that need not split.
  • Insert the new record into the new node.
  • If the root splits, treat it as if it has an empty parent and split as outlined above.

B+ trees grow at the root and not at the leaves.

Deletion

The purpose of the delete algorithm is to remove the desired entry node from the tree structure. We recursively call the delete algorithm on the appropriate node until no node is found. For each function call, we traverse along, using the index to navigate until we find the node, remove it, and then work back up to the root.

At entry L that we wish to remove:

  • If L is at least half-full, done
  • If L has only d-1 entries, try to re-distribute, borrowing from sibling (adjacent node with same parent as L).After the re-distribution of two sibling nodes happens, the parent node must be updated to reflect this change. The index key that points to the second sibling must take the smallest value of that node to be the index key.
  • If re-distribute fails, merge L and sibling. After merging, the parent node is updated by deleting the index key that point to the deleted entry. In other words, if merge occurred, must delete entry (pointing to L or sibling) from parent of L.

Note: merge could propagate to root, which means decreasing height.

thumb|B+ tree deletion|none

Implementation

The leaves (the bottom-most index blocks) of the B+ tree are often linked to one another in a linked list; this makes range queries or an (ordered) iteration through the blocks simpler and more efficient (though the aforementioned upper bound can be achieved even without this addition). This does not substantially increase space consumption or maintenance on the tree. This illustrates one of the significant advantages of a B+tree over a B-tree; in a B-tree, since not all keys are present in the leaves, such an ordered linked list cannot be constructed. A B+tree is thus particularly useful as a database system index, where the data typically resides on disk, as it allows the B+tree to actually provide an efficient structure for housing the data itself (this is described in APFS uses B+ trees to store mappings from filesystem object IDs to their locations on disk, and to store filesystem records (including directories), though these trees' leaf nodes lack sibling pointers.

Database systems

Relational database management systems such as IBM Db2, Informix, support this type of tree for table indices, though each such system implements the basic B+ tree structure with variations and extensions. Many NoSQL database management systems such as CouchDB and Tokyo Cabinet also support this type of tree for data access and storage.

Finding objects in a high-dimensional database that are comparable to a particular query object is one of the most often utilized and yet expensive procedures in such systems. In such situations, finding the closest neighbor using a B+ tree is productive.

iDistance

B+ tree is efficiently used to construct an indexed search method called iDistance. iDistance searches for k nearest neighbors (kNN) in high-dimension metric spaces. The data in those high-dimension spaces is divided based on space or partition strategies, and each partition has an index value that is close with the respect to the partition. From here, those points can be efficiently implemented using B+ tree, thus, the queries are mapped to single dimensions ranged search. In other words, the iDistance technique can be viewed as a way of accelerating the sequential scan. Instead of scanning records from the beginning to the end of the data file, the iDistance starts the scan from spots where the nearest neighbors can be obtained early with a very high probability.

NVRAM

Nonvolatile random-access memory (NVRAM) has been using B+ tree structure as the main memory access technique for the Internet Of Things (IoT) system because of its non static power consumption and high solidity of cell memory. &nbsp;B+ can regulate the trafficking of data to memory efficiently. Moreover, with advanced strategies on frequencies of some highly used leaf or reference point, the B+ tree shows significant results in increasing the endurance of database systems.

See also

  • Binary search tree
  • B-tree
  • Divide-and-conquer algorithm

Notes

References

  • B+ tree in Python, used to implement a list
  • Dr. Monge's B+ Tree index notes