The resulting program is released under the GPL3 licence4.
The program can read plain text files formated with one data entry per line. It is also capable of reading lightweight WEKA files, which have the attributes declared at the beginning.
To facilitate the learning process, the resulting program can be run in 2 different ways. The first option is to run the training algorithm. This option automatically runs the algorithm and produces all the implemented graphs. The second option is to run the algorithm manually. This allows the operator to see how the results are formed and see intermediary steps in how the final trained SOM is formed.
Other configuration options allow setting the number of maximum iterations and delta, the size of the neuron output layer, the attributes that will be shown on correlation graphs and the neighbourhood function used in calculating the radius of the best matching unit (see Chapter 4).
Correlation attributes are 2 attribites that can be chosen from all the available attributes the training data has. This setting affects what will be shown on some of the graphs produced during training. Other attributes will be taken into account when training the map, but will not be shown on certain graphs.
A checkbox titled visualization allows for some graph generating processes to be turned off for faster training. If the checkbox is not ticked, only the neuron map will be generated.
Further configuration is available via the included config file. It is an XML file which allows changing some parameters of the programs behaviour. These options include setting the minimum and maximum values for the number of allowed neurons and minimum and maximum values of the allowed delta.
The resulting program implements a few different graphs for characterizing the input data. Some of these are updated during training, which may impact the time it takes to complete training. The first is a box plot, which shows the minimum, maximum, median and upper and lower quartiles of the input data. An example of the program output is showed in Figure 8.
Figure 8 – Box plot of the iris5 data set
A scatterplot is also constructed from the initial data. This shows the correlation of the 2 attributes selected as correlation attributes plotted on a 2-dimensional grid. The points on the plot are also colored to represent their classification into different clusters. A sample of the program output is shown in Figure 9
Figure 9 - Scatterplot of the Iris data set
Once the SOM training algorithm has been run, more data becomes available. The program automatically switches to the neuron map in the main window. A neuron map shows the placement of neurons on a 2-dimensional grid after training. Each neuron is given coordinates in a 3-dimensional system. As no free component for 3-dimensional visualization could be found, the resulting map is still plotted on a 2-dimensional space. A sample of the programs’ output can be seen on Figure 10. Clicking on an input in the inpiut vector list will highlight the node it was used to create. This highlighting can be seen on Figure 10 at the coordinates (8,1).