This section continues the example presented in the previous section on K-means. In addition to discussing code for implementing agglomerative clustering, it also includes applications of various accuracy measures useful for analyzing clutering performance.
Affinity Propagation
Affinity propagation is another example of a clustering algorithm. As opposed to K-means, this approach does not require us to set the number of clusters beforehand. The main idea here is that we would like to cluster our data based on the similarity of the observations (or how they "correspond" to each other).
Let's define a similarity metric such that if an observation
is more similar to observation
and less similar to observation
. A simple example of such a similarity metric is a negative square of distance
.
Now, let's describe "correspondence" by making two zero matrices. One of them, , determines how well the
th observation is as a "role model" for the
th observation with respect to all other possible "role models". Another matrix,
determines how appropriate it would be for
th observation to take the
th observation as a "role model". This may sound confusing, but it becomes more understandable with some hands-on practice.
The matrices are updated sequentially with the following rules: