Centrality - WikiHQ

In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin. Over time, the concept has expanded substantially, leading to the development of hundreds of distinct centrality measures, the most comprehensive listing of which is documented in the CentralityZoo online catalogue.

Definition and characterization of centrality indices

Centrality indices are answers to the question "What characterizes an important vertex?" The answer is given in terms of a real-valued function on the vertices of a graph, where the values produced are expected to provide a ranking which identifies the most important nodes.

The word "importance" has a wide number of meanings, leading to many different definitions of centrality. Two categorization schemes have been proposed. "Importance" can be conceived in relation to a type of flow or transfer across the network. This allows centralities to be classified by the type of flow they consider important. Both of these approaches divide centralities in distinct categories. A further conclusion is that a centrality which is appropriate for one category will often "get it wrong" when applied to a different category. Other centrality measures, such as betweenness centrality focus not just on overall connectedness but occupying positions that are pivotal to the network's connectivity.

Characterization by network flows

A network can be considered a description of the paths along which something flows. This allows a characterization based on the type of flow and the type of path encoded by the centrality. A flow can be based on transfers, where each indivisible item goes from one node to another, like a package delivery going from the delivery site to the client's house. A second case is serial duplication, in which an item is replicated so that both the source and the target have it. An example is the propagation of information through gossip, with the information being propagated in a private way and with both the source and the target nodes being informed at the end of the process. The last case is parallel duplication, with the item being duplicated to several links at the same time, like a radio broadcast which provides the same information to many listeners at once. uses the Shapley value. Because of the time-complexity hardness of the Shapley value calculation, most efforts in this domain are driven into implementing new algorithms and methods which rely on a peculiar topology of the network or a special character of the problem. Such an approach may lead to reducing time-complexity from exponential to polynomial.

Similarly, the solution concept authority distribution () applies the Shapley-Shubik power index, rather than the Shapley value, to measure the bilateral direct influence between the players. The distribution is indeed a type of eigenvector centrality. It is used to sort big data objects in Hu (2020), such as ranking U.S. colleges.

Important limitations

Centrality indices have two important limitations, one obvious and the other subtle. The obvious limitation is that a centrality which is optimal for one application is often sub-optimal for a different application. Indeed, if this were not so, we would not need so many different centralities. An illustration of this phenomenon is provided by the Krackhardt kite graph, for which three different notions of centrality give three different choices of the most central vertex.

The more subtle limitation is the commonly held fallacy that vertex centrality indicates the relative importance of vertices. Centrality indices are explicitly designed to produce a ranking which allows indication of the most important vertices. This explains why, for example, only the first few results of a Google image search appear in a reasonable order. The pagerank is a highly unstable measure, showing frequent rank reversals after small adjustments of the jump parameter.

While the failure of centrality indices to generalize to the rest of the network may at first seem counter-intuitive, it follows directly from the above definitions.

Complex networks have heterogeneous topology. To the extent that the optimal measure depends on the network structure of the most important vertices, a measure which is optimal for such vertices is sub-optimal for the remainder of the network.

Degree centrality

thumb|505x505px|Examples of A) [[Betweenness centrality, B) Closeness centrality, C) Eigenvector centrality, D) Degree centrality, E) Harmonic centrality and F) Katz centrality of the same random geometric graph.]]

Historically first and conceptually simplest is degree centrality, which is defined as the number of links incident upon a node (i.e., the number of ties that a node has). The degree can be interpreted in terms of the immediate risk of a node for catching whatever is flowing through the network (such as a virus, or some information). In the case of a directed network (where ties have direction), we usually define two separate measures of degree centrality, namely indegree and outdegree. Accordingly, indegree is a count of the number of ties directed to the node and outdegree is the number of ties that the node directs to others. When ties are associated to some positive aspects such as friendship or collaboration, indegree is often interpreted as a form of popularity, and outdegree as gregariousness.

The degree centrality of a vertex <math>v</math>, for a given graph <math>G:=(V,E)</math> with <math>|V|</math> vertices and <math>|E|</math> edges, is defined as

:<math>C_D(v)= \deg(v)</math>

Calculating degree centrality for all the nodes in a graph takes <math>\Theta(V^2)</math> in a dense adjacency matrix representation of the graph, and for edges takes <math>\Theta(E)</math> in a sparse matrix representation.

The definition of centrality on the node level can be extended to the whole graph, in which case we are speaking of graph centralization. Let <math>v*</math> be the node with highest degree centrality in <math>G</math>. Let <math>X:=(Y,Z)</math> be the <math>|Y|</math>-node connected graph that maximizes the following quantity (with <math>y*</math> being the node with highest degree centrality in <math>X</math>):

:<math>H= \sum^{|Y|}_{j=1} [C_D(y*)-C_D(y_j)]</math>

Correspondingly, the degree centralization of the graph <math>G</math> is as follows:

:<math>C_D(G)= \frac{\sum^{|V|}_{i=1} [C_D(v*)-C_D(v_i)]}{H}</math>

The value of <math>H</math> is maximized when the graph <math>X</math> contains one central node to which all other nodes are connected (a star graph), and in this case

:<math>H=(n-1)\cdot((n-1)-1)=n^2-3n+2.</math>

So, for any graph <math>G:=(V,E),</math>

:<math>C_D(G)= \frac{\sum^{|V|}_{i=1} [C_D(v*)-C_D(v_i)] }{|V|^2-3|V|+2}</math>

Also, a new extensive global measure for degree centrality named Tendency to Make Hub (TMH) defines as follows: that is <math display="inline">C_B(v)= (\sum_u d(u,v))^{-1}</math> where <math>d(u,v)</math> is the distance between vertices u and v. However, when speaking of closeness centrality, people usually refer to its normalized form, given by the previous formula multiplied by <math>N-1</math>, where <math>N</math> is the number of nodes in the graph

: <math>C(v)= \frac{N-1}{\sum_u d(u,v)} .</math>

This normalisation allows comparisons between nodes of graphs of different sizes. For many graphs, there is a strong correlation between the inverse of closeness and the logarithm of degree, <math>(C(v))^{-1} \approx -\alpha \ln(k_v) + \beta</math> where <math>k_v</math> is the degree of vertex v while α and β are constants for each network.

Taking distances from or to all other nodes is irrelevant in undirected graphs, whereas it can produce totally different results in directed graphs (e.g. a website can have a high closeness centrality from an outgoing link, but low closeness centrality from incoming links).

Harmonic centrality

In a (not necessarily connected) graph, the harmonic centrality reverses the sum and reciprocal operations in the definition of closeness centrality:

: <math>H(v)= \sum_{u | u \neq v} \frac{1}{d(u,v)}</math>

where <math>1 / d(u,v) = 0</math> if there is no path from u to v. Harmonic centrality can be normalized by dividing by <math>N-1</math>, where <math>N</math> is the number of nodes in the graph.

Harmonic centrality was proposed by Marchiori and Latora (2000) and then independently by Dekker (2005), using the name "valued centrality," and by Rochat (2009).

Betweenness centrality

240px|right|thumb|Hue (from red = 0 to blue = max) shows the node betweenness.

Betweenness is a centrality measure of a vertex within a graph (there is also edge betweenness, which is not discussed here). Betweenness centrality quantifies the number of times a node acts as a bridge along the shortest path between two other nodes. It was introduced as a measure for quantifying the control of a human on the communication between other humans in a social network by Linton Freeman. In his conception, vertices that have a high probability to occur on a randomly chosen shortest path between two randomly chosen vertices have a high betweenness.

The betweenness of a vertex <math>v</math> in a graph <math>G:=(V,E)</math> with <math>V</math> vertices is computed as follows:

For each pair of vertices (s,t), compute the shortest paths between them.
For each pair of vertices (s,t), determine the fraction of shortest paths that pass through the vertex in question (here, vertex v).
Sum this fraction over all pairs of vertices (s,t).

More compactly the betweenness can be represented as:

:<math>C_B(v)= \sum_{s \neq v \neq t \in V}\frac{\sigma_{st}(v)}{\sigma_{st</math>

where <math>\sigma_{st}</math> is total number of shortest paths from node <math>s</math> to node <math>t</math> and <math>\sigma_{st}(v)</math> is the number of those paths that pass through <math>v</math>. The betweenness may be normalised by dividing through the number of pairs of vertices not including v, which for directed graphs is <math>(n-1)(n-2)</math> and for undirected graphs is <math>(n-1)(n-2)/2</math>. For example, in an undirected star graph, the center vertex (which is contained in every possible shortest path) would have a betweenness of <math>(n-1)(n-2)/2</math> (1, if normalised) while the leaves (which are contained in no shortest paths) would have a betweenness of 0.

From a calculation aspect, both betweenness and closeness centralities of all vertices in a graph involve calculating the shortest paths between all pairs of vertices on a graph, which requires <math>O(V^3)</math> time with the Floyd–Warshall algorithm. However, on sparse graphs, Johnson's algorithm may be more efficient, taking <math>O(|V||E|+|V|^2 \log|V|)</math> time. In the case of unweighted graphs the calculations can be done with Brandes' algorithm

Using the adjacency matrix to find eigenvector centrality

For a given graph <math>G:=(V,E)</math> with <math>|V|</math> number of vertices let <math>A = (a_{v,t})</math> be the adjacency matrix, i.e. <math>a_{v,t} = 1</math> if vertex <math>v</math> is linked to vertex <math>t</math>, and <math>a_{v,t} = 0</math> otherwise. The relative centrality score <math>x_v</math> of vertex <math>v</math> can be defined as the nonnegative solution over the set of vertices <math>v \in V</math> to the equations:

:<math>x_v = \frac{1}{\lambda} \sum_{t \in M(v)}x_t = \frac{1}{\lambda} \sum_{t \in G} a_{v,t}x_t</math>

where <math>M(v)</math> is a set of the neighbors of <math>v</math> and <math>\lambda</math> is a constant. With a small rearrangement this can be rewritten in vector notation as the eigenvector equation

:<math>\mathbf{Ax} = {\lambda}\mathbf{x}</math>.

In general, there will be many different eigenvalues <math>\lambda</math> for which a non-zero eigenvector solution exists. Since the entries in the adjacency matrix are non-negative, there is a unique largest eigenvalue, which is real and positive, by the Perron–Frobenius theorem. This greatest eigenvalue results in the desired centrality measure. is a generalization of degree centrality. Degree centrality measures the number of direct neighbors, and Katz centrality measures the number of all nodes that can be connected through a path, while the contributions of distant nodes are penalized. Mathematically, it is defined as

:<math>x_i = \sum_{k=1}^{\infin}\sum_{j=1}^N \alpha^k (A^k)_{ji}</math>

where <math>\alpha</math> is an attenuation factor in <math>(0,1)</math>.

Katz centrality can be viewed as a variant of eigenvector centrality. Another form of Katz centrality is

:<math>x_i = \alpha \sum_{j =1}^N a_{ij}(x_j+1).</math>

Compared to the expression of eigenvector centrality, <math>x_j</math> is replaced by <math>x_j+1.</math>

It is shown that the principal eigenvector (associated with the largest eigenvalue of <math>A</math>, the adjacency matrix) is the limit of Katz centrality as <math>\alpha</math> approaches <math>\tfrac{1}{\lambda}</math> from below.

PageRank centrality

PageRank satisfies the following equation

:<math>x_i = \alpha \sum_{j } a_{ji}\frac{x_j}{L(j)} + \frac{1-\alpha}{N},</math>

where

:<math>L(j) = \sum_{i} a_{ji}</math>

is the number of neighbors of node <math>j</math> (or number of outbound links in a directed graph). Compared to eigenvector centrality and Katz centrality, one major difference is the scaling factor <math>L(j)</math>. Another difference between PageRank and eigenvector centrality is that the PageRank vector is a left hand eigenvector (note the factor <math>a_{ji}</math> has indices reversed).

Percolation centrality

A slew of centrality measures exist to determine the 'importance' of a single node in a complex network. However, these measures quantify the importance of a node in purely topological terms, and the value of the node does not depend on the 'state' of the node in any way. It remains constant regardless of network dynamics. This is true even for the weighted betweenness measures. However, a node may very well be centrally located in terms of betweenness centrality or another centrality measure, but may not be 'centrally' located in the context of a network in which there is percolation. Percolation of a 'contagion' occurs in complex networks in a number of scenarios. For example, viral or bacterial infection can spread over social networks of people, known as contact networks. The spread of disease can also be considered at a higher level of abstraction, by contemplating a network of towns or population centres, connected by road, rail or air links. Computer viruses can spread over computer networks. Rumours or news about business offers and deals can also spread via social networks of people. In all of these scenarios, a 'contagion' spreads over the links of a complex network, altering the 'states' of the nodes as it spreads, either recoverable or otherwise. For example, in an epidemiological scenario, individuals go from 'susceptible' to 'infected' state as the infection spreads. The states the individual nodes can take in the above examples could be binary (such as received/not received a piece of news), discrete (susceptible/infected/recovered), or even continuous (such as the proportion of infected people in a town), as the contagion spreads. The common feature in all these scenarios is that the spread of contagion results in the change of node states in networks. Percolation centrality (PC) was proposed with this in mind, which specifically measures the importance of nodes in terms of aiding the percolation through the network. This measure was proposed by Piraveenan et al.

Percolation centrality is defined for a given node, at a given time, as the proportion of 'percolated paths' that go through that node. A 'percolated path' is a shortest path between a pair of nodes, where the source node is percolated (e.g., infected). The target node can be percolated or non-percolated, or in a partially percolated state.

:<math>PC^t(v)= \frac{1}{N-2}\sum_{s \neq v \neq r}\frac{\sigma_{sr}(v)}{\sigma_{sr\frac