Kuido Külm & Mihkel Jõhvik
Self-Organizing Maps – Training Visualization Tool
Project for MTAT.03.183 Data Mining
Table of figures 3
1 Introduction 5
2 Related Work 6
3 Self-Organizing Maps 8
4 Implementation 9
5 Results 14
5.1 Configuration 14
5.2 Visualization 14
6 Conclusion 19
Figure 1 - A screen capture of the wsom program in action  6
Figure 2 - Tom Germano's Java applet in action 7
Figure 3 - The results of JavaSOM in SVG format 7
Figure 4 - 8 colors mapped onto a 2D grid by a SOM  9
Figure 5- Graph of a discrete neighborhood function 10
Figure 6 - Graph of a Gauss neighborhood function  11
Figure 7 - Graph of a mexican hat neighborhood function  11
Figure 8 – Box plot of the iris data set 15
Figure 9 - Scatterplot of the Iris data set 15
Figure 10 - Neuron map of the Iris dataset 16
Figure 11 - Weight change ratio whil training the Iris data set 16
Figure 12 - Best matching units found while training on the Iris dataset 17
Figure 13 - Neuron heatmap from the Iris dataset 17
Figure 14- Weights change average of the Iris data set 18
Figure 15 - Weights change graph of the Iris dataset 18
Figure 16 - Decreasing neighborhood radius 19
This project is intended to help students better understand Self-Organizing Maps. The main contribution is a program that allows training Self-Organizing Maps on structured datasets and vizualizing the steps taken by the algorithm and the resulting Self-Organizing Map. Chapters 3 and 4 of this project explain the theoretical aspects of SOMs and their implementation in a software project. Chapter 5 focuses on explaining different aspects of the related program.
Self-Organizing Map (referenced as SOM from now on) is a computational method for the visualization and analysis of high dimensional data . SOMs allow for higher dimensional data to be clustered and mapped onto a lower dimensional object, usually a 2-dimensional grid. They are a form of neural networks that use unsupervised learning. The SOM structure and algortihm is explained in more detail in chapter 3.
SOMs are used in many data organization tasks, such as categorizing images. While SOMs offer unsupervized learning, they often do not reach optimal distribution and require further specification and classification by an overseer.
SOMs can be a difficult subject to comprehend and implement and some assistance in learning about SOMs might be required. In the opinion of the authors, vizualisation is a key factor in understanding how SOMs work and operate. There are many demos and videos of SOMs in action available on the Internet. Some of them are mentioned in Chapter 2. However few show any helpful information about how different variables within the algorithm act while training.
The goal of this project is to create a program that implements SOMs and allows for visualizing different aspects of the training process and the trained SOM.
Many different SOM visualization tools have been created for educational purposes. In this section we cover only those that are released under a public licence.
Christian Borgelt has created a SOM visualization tool called wsom (or xsom for the Linux version) . This tool allows viewing a grid of interconnected nodes training itself to organize itself into a rectangular 2D grid. The SOM being trained is redrawn after every cycle of the SOM algorithm, which creates an active view of the training process. This program does not characterize any other data about the SOM or its training process. Neither does this program allow loading different datasets for training. A screen capture of wsom in action can be seen on Figure 1.
Figure 1 - A screen capture of the wsom program in action 
Another illustrative program was created by Tom Germano, which uses colors with their three base components of red, green and blue as training data. It is a Java applett that can be configured to run different starting positions and at varying numbers of iterations. While different characteristics of the training process can be extrapolated from visual that the program provides while it’s running, it does not explicitly provide any information about how different aspects of the training algorithm change in time. Neither does this program allow loading different datasets for training. A screen capture of the program in action can be seen on Figure 2.
Figure 2 - Tom Germano's Java applet in action
JavaSOM is a SOM vizualisation tool created by Tomi Suuronen for his Bachelor’s Thesis. It allows loading training data in XML format and the trained SOM can be saved into XML, SVG and PDF formats. The last two are presented as images. While this program allows different inputs for datasets, it does not show any information about the training process itself. A screen capture of the program’s results in SVG format can be seen on Figure 3.
Figure 3 - The results of JavaSOM in SVG format
As there are already different programs that show how a SOM organizes itself, we decided to focus on a slightly different area. Namely, how different elements of the SOM training algorithm act while the algorithm is running.