Self-organizing maps are quite fascinating and there is a good amount of literature available on internet. I tried some hands over SOMs to understand them better and thought of writing this article (as I realized that I learn more by just writing a blog about something).

So in this post, I will try to give a high level overview of SOM and then will try to map the actual algorithm execution on Iris dataset.

**Self-Organizing Maps**

SOM is an artificial neural network which learns through unsupervised learning. This helps in summarizing (and visualizing) high dimensional input data in lower dimensions (usually in 2 dimensions). In self-organizing maps, the so called neurons/nodes are organized in a rectangular or hexagonal grid. Each of these nodes has weights and these weights are nothing but multi-dimensional vectors (having same number of dimensions as input vectors). High dimensional input data is presented to this grid and it preserves principal features of the input data presented (topology of the data in the input space).

So here are the high level steps involved in the training:

1. Size of the grid is defined and then weights of each of the nodes of the grid are randomly initialized (they can also be initialized using PCA data)

2. Input vectors from the training data are randomly presented to the network. Input vectors are normalized before learning

3. Each node’s weight is compared with the input vector and the node with the weight nearest to input vector is identified and called ‘best matching unit’ (BMU)

4. Now BMU node weights, along with its neighbouring nodes, are adjusted to come nearer to the input vector. Below are the equations which define how the weights are updated and also how neighbourhood of BMU is identified:

Θ is the neighbourhood function which helps nodes nearer to BMU learns more than those farther. This is usually taken as below:

Θ(t) = exp(-S/2σ^{2}(t)) where S is the Euclidian distance between the node being updated and the winning neuron

And below is how σ is defined:

σ(t) = σ_{0} exp(-t/τ) where t is the current time and τ is the time constant. So this decays over time.

There is another parameter which is called learning rate ( L(t) ). This is introduced to decay the learning over time and is defined as:

L(t) = L_{0} exp(-t/τ)

So below is the final weight updating formula:

W(t+1) = W(t) + Θ(t). L(t) (x – W(t)) where x is the input vector

The above procedure is repeated multiple times for various inputs.

5. Visualization of the above output can be done is various ways and one of the most common ways is U-Matrix. In a U-matrix, Euclidean distance between neighbouring neurons is calculated and then this distance is representing in a gray scale or color scale.

**SOM execution on Iris dataset**

Now let us try to map the above discussed concepts with the actual execution of self-organizing maps on Iris data (which has four dimensions). I have used https://github.com/JustGlowing/minisom (which is a python implementation for SOM developed by Giuseppe Vettigli) and modified it for my purpose.

1. Here the size of the grid is defined as 7 x 7 and weights of the nodes have been randomly initialized. After this these weights were normalized using frobenius norm. Below is how this randomly initialized map looks like (it may look different next time as it is randomly initialized). Implementation code was modified to have color scale to be used as Red Blue.

2. Now is the training time with Iris dataset. Each training example from the dataset is normalized before presenting it to SOM.

3. For each row of normalized Iris data, a winner neuron is identified. This is done by calculating the difference between the input data row and all neuron weights and then finding the minimum of the Euclidian distances. This is called ‘Best Matching Unit’ (BMU).

4. As mentioned in the above overview, now weights of the BMU and the neighbourhood neurons are updated to bring them near to input data. In this implementation, the entire grid is neighbourhood but as the distance from BMU increases, Θ also decreases. Also, to be inline with the above overview, exponentially decaying function has been provided for ‘sigma’ and ‘learning rate’. Each time the weights are updated during the learning phase they are normalized using frobenius norm.

I have given σ_{0 }= 2.0 and L_{0} = 0.5 here. As a result of the training we will get a 7 x 7 x 4 matrix (as the weights are of equal dimensions as input data).

5. Now how do you visualize this 7 x 7 x 4 matrix in a grid of 7 x 7 nodes? So we will have to prepare a U-matrix (which is an average distance matrix between a node’s weight vectors and neighbouring nodes’ weight vectors). Here the neighbouring nodes are taken as highlighted below (black dot is the node for which average distance is calculated and the other blue nodes are the one which are used to calculate the average distance):

Below is the final distance map which got created from the trained SOM:

BMUs can now be highlighted where they reside on the final distance map. Below is the graph where BMUs are overlaid on the distance map. Different markers are placed for each species (circle for setosa, square for versicolor and diamond for virginica).

As it is clear from the above graph that setosa has a cluster which stands away from the other two species which has some overlap with each other). This can further be extended to do classification for new set of Iris data.

References:

1. https://en.wikipedia.org/wiki/Self-organizing_map

2. https://github.com/JustGlowing/minisom

3. http://www.shy.am/wp-content/uploads/2009/01/kohonen-self-organizing-maps-shyam-guthikonda.pdf