This widely known and very impressive experiment with the network of artificial neurons was conducted by Terence Sejnowski and Charles Rosenberg in 1987. They built a network of 309 neurons and taught it to read and pronounce English words. Description of this experiment allows us to understand how information networks operate.
The network, called NETtalk, was implemented on a computer as follows:
1) The letters that make up a word were submitted to 203 input neurons.
2) in response to these signal, the input neurons generated signals sent to 80 hidden neurons.
3) In turn, the hidden neurons generated signals that were transmitted to the 26output neurons.
4) The signals generated by the output neurons corresponded to the phonetic transcription, i.e. to pronunciation of the word. Alternatively, the output neurons could be connected to a synthesizer that physically converts the signal into sound.
A description of the training procedure should be added to this to the description of the network. For this purpose a recording was done of a child's speech, which was translated into symbols of phonetic transcription. A set of 1024 words were used, and sometimes pronunciation of the same words differed. After that, the following operation was repeated several times:
‘Feed forward’: words in writing were served on the input neurons
The signals generated by the input neurons were passed - reinforced or weakened by weights wij - to the hidden neurons
The resulting signals from the output neurons were compared with the phonetic transcription of the word, and the magnitude of deviation was calculated
‘Feed back’: on this basis, the all weights which corresponded to connections between the output and the hidden neurons were adjusted according to a standard formula
Based on this the weights which corresponded to connections between the hidden neurons and the input neurons were corrected in a similar way16
The authors describe the learning outcomes, as follows:
The percentage of correct phonemes rose rapidly at first and continued to rise at slower rate throughout the learning, reaching 95% after 50 passes through the corpus (i.e. all the 1024 words – V.Sh.). Primary and secondary stresses and syllable boundaries were learned very quickly for all words and achieved nearly perfect performance by 5 passes. When the learning curves were plotted as error rates on double logarithmic scales they were approximately straight lines, so that the learning follows a power law, which is characteristic of human skill learning. The distinction between vowels and consonants was made early; however, the network predicted the same vowel for all vowels and the same consonant for all consonants, which resulted in a babbling sound. A second stage occurred when word boundaries are recognized, and the output then resembled pseudowords. After just a few passes through the network many of the words were intelligible, and by 10 passes the text was understandable.
When the network made an error it often substituted phonemes that sounded similar to each other. For example, a common confusion was between the ‘th’ sounds in ‘thesis’ and ‘these’ which differ only in voicing. Few errors in a well-trained network were confusions between vowels and consonants. Some errors were actually corrections to inconsistencies in the original training corpus. Overall, the intelligibility of the speech was quite good.
That is, in many respects the network reproduces the human behavior:
NETtalk is an illustration in miniature of many aspects of learning. First, the network starts out without considerable ‘innate’ knowledge in the form of input and output representations that were chosen by the experimenters, but with no knowledge specific for English - the network could have been trained on any language with the same set of letters and phonemes. Second, the network acquired its competence through practice, went through several distinct stages, and reached a significant level of performance.
Where does the network accumulate knowledge?
For the purposes of this book the experiment with the NETtalk has an additional value because it helps to understand how the network locates the acquired knowledge. As it turned out, the information is not localized, but is distributed throughout the network. Here is how The New York Times described these results after an interview with Terence Sejnowski:
He also found that 10 randomly chosen neurons could be used as a ‘seed’ to reproduce the entire coding scheme. In this sense the network is like a hologram. Whether one of these laser-generated images is cut in halves, quarters, eighths or sixteenths, each piece contains the whole image, though with increasingly poorer resolution.
Using mathematical analysis, he is beginning to uncover this hidden knowledge. ''It turned out to be very sensible,'' he said. ''The vowels are represented differently from the consonants. Things that sound similar are clustered together.'' The letter ''p'' is situated near ''b,'' while ''a'' and ''e'' each have regions.
And here is how the authors themselves describe the results of research they have undertaken in order to understand the role played by individual neurons and connections between during the network’s learning process:
The standard network used for analysis had 7 input groups and 80 hidden units and had been trained to 95% correct on the 1000 dictionary words. The levels of activation of the hidden units were examined for each letter of each word ... On average, about 20% of the hidden units were highly activated for any given input, and most of the remaining hidden units had little or no ac tivation. Thus, the coding scheme could be described neither as a local representation, which would have activated only a few units, or a ‘holographic’ representation, in which all of the hidden units would have participated to some extent. It was apparent, even without using statistical techniques, that many hidden units were highly activated only for certain letters, or sounds, or letter-to-sound correspondences. A few of the hidden units could be assigned unequivocal characterizations, such as one unit that responded only to vowels, but most of the units participated in more than one regularity.
To test the hypothesis that letter-to-sound correspondences were the primary organizing variable, we computed the average activation level of each hidden unit for each letter-to-sound correspondence in the training corpus. The result was 79 vectors with 80 components each, one vector for each letter-to-sound correspondence. A hierarchical clustering technique was used to arrange the letter-to-sound vectors in groups based on a Euclidean metric in the 80-dimensional space of hidden units. The overall pattern, as shown in Figure 8, was striking: the most important distinction was the complete separation of consonants and vowels. However, within these two groups the clustering had a different pattern. For the vowels, the next most important variable was the letter, whereas consonants were clustered according to a mixed strategy that was based more on the similarity of their sounds. The same clustering procedure was repeated for three networks starting from different random starting states. The patterns of weights were completely different but the clustering analysis revealed the same hierarchies, with some differences in the details, for all three networks.
Such more or less even distribution of information across the network makes the network very resistant to damage:
We examined performance of a highly-trained network after making random changes of varying size to the weights. …(R)andom perturbations of the weights uniformly distributed on the interval [-0.5, 0.5] had little effect on the performance of the network, and degradation was 'gradual with increasing damage. This damage caused the magnitude of each weight to change on average by 0.25; this is the roundoff error that can be tolerated before the performance of the network begins to deteriorate and it can be used to estimate the accuracy with which each weight must be specified. The weights had an average magnitude of 0.8 and almost all had a magnitude of less than 2.
(T)he information was distributed in the network such that no single unit or link was essential. As a consequence, the network was fault tolerant and degraded gracefully with increasing damage. Moreover, the network recovered from damage much more quickly than it took to learn initially.
In his article17 Hayek claimed that the information contained in the system of prices is sufficient to organize the interaction between distinct market participants. This was one of the basic ideas of the article, however it was not, in my view, substantiated enough. Meanwhile, the presented results of experiments with neural networks provide additional reinforcement to the thesis of Hayek because the prices in the market economy can be interpreted as having the same role as the weights which modify signals at the synapses of the neural networks.
Chapter 3. Main characteristics of cognitive spaces
Based on the above more or less well-known facts and argumentations, now we will try to offer some new ideas and concepts with which to analyze the properties of cognitive spaces. First, we try to combine two ideas - that of the sign system and that of the informational network - and draw an embracing more general model. Next we turn our attention to the differences between the knowledge which can be divided into self-contained ‘pieces’ - that is, localizable and atomizable knowledge - and holistic knowledge. We then discuss the issues related to the relationship between ideas and reality: how ideal objects originate, how we can discuss their structure, and how they can be re-coded, i.e. their mode of 'physical' existence changed. Finally, we discuss another important feature of cognitive spaces: their ability of reflection, that is, of self-reference and self-organization.
A general view of cognitive spaces
Suppose a person thinks of an idea. Then he can implement it by himself, thus turning the idea into reality. Or alternately, the idea can be first socialized by passing it on to others through some act of communication and being approved by them. We call these three steps the basic triad.
These three actions may occur in different forms and be differently sequenced. Below, we will discuss the liberal social order, in which an individual first makes use of the resources which he owns and implements his idea turning it into a product. After that, the idea gets socialized and obtains the approval of others, which is usually done by way of sale and purchase. Outside of the liberal order, this sequence may be different. Within a collectivist (‘socialist’) model, the second step takes place not as implementation but as socialization of ideas: a person has to convince his fellow members of his collective that his idea is worth implementation. After that, at the third step it is turned into reality – as a common goal of the entire collective.
Social cognitive mechanism
In order to further discuss the emerging options, let us represent human cognitive environment in the form of a generalized Turing machine:
Picture 9. Generalized Turing machine (реальность is Russian for reality)
We will interpret the concept of the machine, of its internal states, inputs, outputs and the external memory tape as follows.
We will assume that the machine itself is a social group, i.e. a somehow organized number of people. If their interaction does not affect the external world (‘the reality’), then we can consider such interaction as changes in the machine’s internal state.
The input and the output of the generalized machine will be interpreted as its relations to ‘the reality’. These relations may be arranged very differently. The entrance from, and the exit to ‘the reality’ can be exclusively owned by one or several of the elements of the machine (as was the case, say, with the network NETtalk). Or, on the contrary, all the elements of the machine may have the same opportunity to interact with ‘the reality’ without any restrictions. Finally, a configuration is possible where each person or each group of people of the given society may be allocated some ‘territory’ of ‘the reality’ and allowed to interact only with the ‘territory’ of their own.