New Page 1

The code to perform clustering based on the variance ratio criterion was provided by Luke Rendell (ler4 at st-andrews.ac.uk) and is described in Rendell and Whitehead (2003).

The variance ratio criterion (Calinski and Harabasz, 1974; Milligan and Cooper, 1985; Schreer et al, 1998) uses k-means clustering to get clustering results for different values of k. These k-means runs are randomly initialized and therefore have to be run a number of times to ensure an optimal clustering. When the data has been clustered in a few different models the best model is selected with the following rule :

where BGSS is the between cluster sum-of-squares, WGSS the within cluster sum-of-squares, k the number of clusters and n the number of samples. Evaluating the ratio for the different models with increasing k, the optimal clustering should be given by the first local maximum of the ratios (Calinski and Harabasz, 1974). An important difference between this algorithm and the other two available in the software is that this algorithm does not specifically search for normally distributed clusters.

Citations :

Calinski, T. and J. Harabasz (1974). A dendrite method for cluster analysis. Comm. Statist., 3:1-27.

Milligan, G.W. and M.C. Cooper (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50:159-179.

Rendell, L.E. and H.Whitehead (2003). Comparing repertoires of sperm whale codas: a multiple methods approach. The international Journal of Animal Sound and its Recordings, 14:61-81.

Schreer, J.F., R.J. O'Hara Hines and K.M. Kovacs (1998). Classification of dive profiles: A comparison of statistical clustering techniques and unsupervised artificial neural networks. J. Agric. Biol. Envir. S., 3:383-404.