Hard Cliques and Clustering in Graph Structure

An important aspect of graph structure is a clique or cluster: a highly connected component of nodes. For example, a clique within a social network might be a group of friends that all know each other. If we have a graph that represents countries that are trading partners, we could identify a cartel by searching for a clique. Finally, we might have a graph where nodes represent individuals in a company and edges represent e-mail conversations; if a clique appears in this graph - the individuals may be participating in some project, or even engaging in fraud [enron]. Indeed, the search for clique’s has been applicable to a wide range of networks - relevant to sociology, physical networks, and bioinformatics. Thus, a common question for computer scientists is to rigidly define what a clique is, and determine a way to find them.

One technique for finding clique’s is to partition a graph into k (integer) clusters. Each node must be in one and only one cluster (”hard clustering”). Finally, the objective is to assign the nodes to cluster’s in an optimal way. The first way is to maximize the minimum-distance between nodes in two separate clusters. That is, the technique tries to make any two clusters as far as possible from each other. Note that this problem is solvable in polynomial time using a minimum spanning tree. A second way is to minimize the maximum-distance between nodes within the same cluster. That is, the technique tries to make every clique as ’small’ and ‘dense’ as possible. Note that this problem is NP complete, with polynomial time approximations. [Details on the two techniques (Asano, Bhattacharya, Keil, Yao, 1988)] . Both techniques do not require the clusters to be strongly connected. Thus, given a graph, we can find partition the graph into clique’s.

Once a graph is partitioned into clusters, it may be interesting to determine how dense each cluster is. For example, if every person in a social clique is friends with each other, then the clique is dense - and fully connected. However, if there are half as many edges in the social clique as in the fully connected clique — then the clique is less dense. A common metric for representing the density of the neighbors around a node is the Clustering Coefficient [wiki]. Indeed, the clustering coefficient over a whole graph provides a good metric for whether a graph will exhibit small world phenomena. A graph with a high clustering coefficient will be less small-world than a similar graph with the same number of edges and small clustering coefficient. The clustering coefficient is a ratio: the number of edges between a node’s neighbors divided by the number of possible edges. This can be calculated trivially. Finally, the density of an entire clique can be determined by averaging the clustering coefficient of all its members.

Thus, we can use the clustering coefficient and k-clustering techniques to find clique’s within a network. Both are important ways to study the interconnectivity within a graph.

Posted in Topics: Education

Responses are currently closed, but you can trackback from your own site.

Comments are closed.



* You can follow any responses to this entry through the RSS 2.0 feed.