Modularity maximization
Our goal is to find a measure that quantifies how many edges lie within groups in our network relative to the number of such edges expected on the basis of chance. A good division of nodes into communities is one that maximizes such a measure. Equivalently, we want a measure that quantifies how many edges lie between groups in our network relative to the expected number of such links. A good division of nodes into communities is one that minimizes such a measure. We will concentrate on the former measure of modularity of a network.
Let us focus on undirected multi-graphs, that is, graphs that allow self-edges (edges involving the same node) and multi-edges (more than one simple edge between two vertices). A measure of modularity of a network is the number of edges that run between vertices of the same community minus the number of such edges we would expect to find if the configuration model is assumed, that is if edges were positioned at random while preserving the vertex degrees. Let us denote the community of vertex and if and otherwise. Hence, the number of edges that run between vertices of the same group is:
where
is the set of edges of the graph and
is the actual number of edges between
and
, which is zero or more (notice that each undirected edge is represented by two pairs in the second sum, hence the factor one-half).
The expected number of edges that run between vertices of the same group is:
where
and
are the degrees of
and
, while
is the number of edges of the graph. Notice that
is the expected number of edges between vertices
and
in the configuration model assumption. Indeed, consider a particular edge attached to vertex
. The probability that this edge goes to node
is
, since the number of edges attached to
is
and the total number of edge ends in the network is
(the sum of all node degrees). Since node
has
edges attached to it, the expected number of edges between
and
is
.
Hence the difference between the actual and expected number of edges connecting nodes of the same group, expressed as a fraction with respect to the total number of edges
, is called modularity, and given by:
where:
and
is called the modularity matrix.
The modularity
takes positive values if there are more edges between same-group vertices than expected, and negative values if there are less. Our goal is to find the partition of network nodes into communities such that the modularity of the division is maximum. Unfortunately, this is a computationally hard problem. It is believed that the only algorithms capable of always finding the division with maximum modularity take exponentially long to run and hence are useless for all but the smallest of networks. Instead, therefore, we turn to heuristic algorithms, algorithms that attempt to maximize the modularity in an intelligent way that gives reasonably good results in a quick time.