Girvan–Newman algorithm

Girvan–Newman algorithm

The Girvan–Newman algorithm (named after Michelle Girvan and Mark Newman) is one of the methods used to detect communities in complex systems.[1] The notion of a "community structure" is related to that of clustering, though it isn't quite the same. A community consists of a subset of nodes within which the node-node connections are dense, and the edges to nodes in other communities are less dense.[1] There are numerous alternative method for detecting communities in networks. These include hierarchical clustering, partitioning graphs to maximize quality functions such as network modularity, k-clique percolation, etc.

Edge betweenness and community structure

The hierarchical clustering method is based on assigning a weight for every edge and placing these edges into an initially empty network, starting from edges with strong weights and progressing towards the weakest ones. The edges with the greatest weights within the community are the most central ones. Although traditional in community detection, the method presents some pathologies. One of them for instance, is the inability to classify in a community a node which is connected to the network with only one edge.

The Girvan–Newman algorithm works the opposite way. Instead of trying to construct a measure that tells us which edges are the most central to communities, it focuses on these edges that are least central, the edges that are most "between" communities. The communities are detected by progressively removing edges from the original graph, rather than by adding the strongest edges to an initially empty network.

Vertex betweenness has been studied in the past as a measure of the centrality and influence of nodes in networks. For any node i, vertex betweenness is defined as the number of shortest paths between pairs of nodes that run through it. It is a measure of the influence of a node over the flow of information between other nodes, especially in cases where information flow over a network primarily follows the shortest available path. The Girvan–Newman algorithm extends this definition to the case of edges, defining the "edge betweenness" of an edge as the number of shortest paths between pairs of nodes that run along it. If there is more than one shortest path between a pair of nodes, each path is assigned equal weight such that the total weight of all of the paths is equal to unity. If a network contains communities or groups that are only loosely connected by a few intergroup edges, then all shortest paths between different communities must go along one of these few edges. Thus, the edges connecting communities will have high edge betweenness (at least one of them). By removing these edges, the groups are separated from one another and so the underlying community structure of the network is revealed.

The algorithm's steps for community detection are summarized below

  1. The betweenness of all existing edges in the network is calculated first.
  2. The edge with the highest betweenness is removed.
  3. The betweenness of all edges affected by the removal is recalculated.
  4. Steps 2 and 3 are repeated until no edges remain.

The fact that the only betweennesses being recalculated are only the ones which are affected by the removal, may lessen the running time of the process' simulation in computers. However, the betweenness centrality must be recalculated with each step, or severe errors occur. The reason is that the network adapts itself to the new conditions set after the edge removal. For instance, if two communities are connected by more than one edge, then there is no guarantee that all of these edges will have high betweenness. According to the method, we know that at least one of them will have, but nothing more than that is known. By recalculating betweennesses after the removal of each edge, it is ensured that at least one of the remaining edges between two communities will always have a high value.

The end result of the Girvan–Newman algorithm is a dendrogram. As the Girvan–Newman algorithm runs, the dendrogram is produced from the top down (i.e. the network splits up into different communities with the successive removal of links). The leaves of the dendrogram are individual nodes.

See also

References

  1. ^ a b Girvan M. and Newman M. E. J., Community structure in social and biological networks, Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002)

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Girvan-Newman algorithm — The Girvan Newman algorithm is one of the methods used to detect communities in complex systems.Girvan M. and Newman M. E. J., Proc. Natl. Acad. Sci. USA 99, 7821 7826 (2002)] The notion of a community structure is related to that of clustering,… …   Wikipedia

  • Community structure — In the study of complex networks, a network is said to have community structure if the nodes of the network can be easily grouped into (potentially overlapping) sets of nodes such that each set of nodes is densely connected internally. In the… …   Wikipedia

  • List of mathematics articles (G) — NOTOC G G₂ G delta space G networks Gδ set G structure G test G127 G2 manifold G2 structure Gabor atom Gabor filter Gabor transform Gabor Wigner transform Gabow s algorithm Gabriel graph Gabriel s Horn Gain graph Gain group Galerkin method… …   Wikipedia

  • Clique percolation method — The clique percolation method[1] is a popular approach for analyzing the overlapping community structure of networks. The term network community (also called a module, cluster or cohesive group) has no widely accepted unique definition and it is… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”