PERFORMANCE OF SELECTED AGGLOMERATIVE HIERARCHICAL CLUSTERING METHODS |
![]() |
Nusa Erman , Ales Korosec , Jana Suklan Abstract A broad variety of different methods of agglomerative hierarchical clustering brings along problems how to choose the most appropriate method for the given data. It is well known that some methods outperform others if the analysed data have a specific structure. In the presented study we have observed the behaviour of the centroid, the median (Gower median method), and the average method (unweighted pair-group method with arithmetic mean – UPGMA; average linkage between groups). We have compared them with mostly used methods of hierarchical clustering: the minimum (single linkage clustering), the maximum (complete linkage clustering), the Ward, and the McQuitty (groups method average, weighted pair-group method using arithmetic averages - WPGMA) methods. We have applied the comparison of these methods on spherical, ellipsoid, umbrella-like, “core-and-sphere”, ring-like and intertwined three-dimensional data structures. To generate the data and execute the analysis, we have used R statistical software. Results show that all seven methods are successful in finding compact, ball-shaped or ellipsoid structures when they are enough separated. Conversely, all methods except the minimum perform poor on non-homogenous, irregular and elongated ones. Especially challenging is a circular double helix structure; it is being correctly revealed only by the minimum method. We can also confirm formerly published results of other simulation studies, which usually favour average method (besides Ward method) in cases when data is assumed to be fairly compact and well separated. Keywords: hierarchical clustering, agglomerative methods, divisive methods, simulated data Nuša Erman, Ph.D. is the associate of the School of Advanced Social Studies, Nova Gorica, Slovenia. Contact: nusa.erman (at) gmail.com Cite this article:
|