Cluster graph modification problems.

*(English)*Zbl 1068.68107Summary: In a clustering problem one has to partition a set of elements into homogeneous and well-separated subsets. From a graph theoretic point of view, a cluster graph is a vertex-disjoint union of cliques. The clustering problem is the task of making the fewest changes to the edge set of an input graph so that it becomes a cluster graph. We study the complexity of three variants of the problem. In the cluster completion variant edges can only be added. In cluster deletion, edges can only be deleted. In cluster editing, both edge additions and edge deletions are allowed. We also study these variants when the desired solution must contain a prespecified number of clusters. We show that cluster editing is NP-complete, cluster deletion is NP-hard to approximate to within some constant factor, and cluster completion is polynomial. When the desired solution must contain exactly \(p\) clusters, we show that cluster editing is NP-complete for every \(p\geqslant2\); cluster deletion is polynomial for \(p=2\) but NP-complete for \(p>2\); and cluster completion is polynomial for any \(p\). We also give a constant factor approximation algorithm for a variant of cluster editing when \(p=2\).

##### MSC:

68R10 | Graph theory (including graph drawing) in computer science |

05C85 | Graph algorithms (graph-theoretic aspects) |

68Q25 | Analysis of algorithms and problem complexity |

PDF
BibTeX
XML
Cite

\textit{R. Shamir} et al., Discrete Appl. Math. 144, No. 1--2, 173--182 (2004; Zbl 1068.68107)

Full Text:
DOI

##### References:

[1] | Alizadeh, A.A.; Eisen, M.B., Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature, 403, 6769, 503-511, (2000) |

[2] | Bansal, N.; Chawla, S.; Blum, A., Correlation clustering, (), 238-247 |

[3] | Ben-Dor, A.; Shamir, R.; Yakhini, Z., Clustering gene expression patterns, J. comput. biol., 6, 3/4, 281-297, (1999) |

[4] | Berge, C., Graphs and hypergraphs, (1973), North-Holland Amsterdam · Zbl 0483.05029 |

[5] | Chen, Z.Z.; Jiang, T.; Lin, G., Computing phylogenetic roots with bounded degrees and errors, SIAM J. comput., 32, 864-879, (2003) · Zbl 1053.68069 |

[6] | Garey, M.R.; Johnson, D.S., Computers and intractabilitya guide to the theory of NP-completeness , (1979), W.H. Freeman and Co. San Francisco |

[7] | Goemans, M.X.; Williamson, D.P., Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, J. ACM, 42, 6, 1115-1145, (1995) · Zbl 0885.68088 |

[8] | Golub, T.R.; Slonim, D.K., Molecular classification of cancerclass discovery and class prediction by gene expression monitoring , Science, 286, 531-537, (1999) |

[9] | Hagen, C.; Kahng, A., New spectral methods for ratio cut partitioning and clustering, IEEE trans. comput. aided design of integrated circuits systems, 11, 9, 1074-1085, (1992) |

[10] | Hansen, P.; Jaumard, B., Cluster analysis and mathematical programming, Math. programming, 79, 191-215, (1997) · Zbl 0887.90182 |

[11] | Hartigan, J., Clustering algorithms, (1975), Wiley New York · Zbl 0372.62040 |

[12] | D.S. Hochbaum (Ed.), Approximation Algorithms for NP-Hard Problems, PWS Publishing, Boston, 1997. |

[13] | Lovasz, L., Covering and coloring of hypergraphs, () |

[14] | A. Natanzon, Complexity and approximation of some Graph modification problems, Master’s thesis, Department of Computer Science, Tel Aviv University, 1999. |

[15] | Natanzon, A.; Shamir, R.; Sharan, R., Complexity classification of some edge modification problems, Discrete appl. math., 113, 109-128, (2001) · Zbl 0982.68104 |

[16] | Papadimitriou, C.; Yannakakis, M., Optimization, approximation, and complexity classes, J. comput. system sci., 43, 425-440, (1991) · Zbl 0765.68036 |

[17] | Shamir, R.; Sharan, R.; Tsur, D., Cluster graph modification problems, (), 379-390 · Zbl 1022.68104 |

[18] | Sharan, R.; Maron-Katz, A.; Shamir, R., CLICK and expandera system for clustering and visualizing gene expression data , Bioinformatics, 19, 1787-1799, (2003), (A preliminary version appeared in Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 307-316, AAAI Press, New York, 2000) |

[19] | Wu, Z.; Leahy, R., An optimal graph theoretic approach to data clusteringtheory and its application to image segmentation , IEEE trans. pattern analysis Mach. intell., 15, 11, 1101-1113, (1993) |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.