partial). The fundamental frequency, in this case 200 Hz, is also called the first harmonic, the 400 Hz component (2 fo) is called the second harmonic, the 600 Hz component (3 fo) is called the third harmonic, and so on.
The second panel in Figure 3-11 shows a complex periodic signal with a fundamental period of 10 ms and, consequently, a fundamental frequency of 100 Hz. The harmonic spectrum that is associated with this signal will therefore show energy at 100 Hz, 200 Hz, 300 Hz, 400 Hz, 500 Hz, and so on. The bottom panel of Figure 3-11 shows a complex periodic signal with a fundamental period of 2.5 ms, a fundamental frequency of 400 Hz, and harmonics at 400, 800, 1200, 1600, and so on. Notice that there two completely interchangeable ways to define the term fundamental frequency. In the time domain, the fundamental frequency is the number of cycles of the complex pattern that are completed in one second. In the frequency domain, except in the case of certain special signals, the fundamental frequency is the lowest harmonic in the harmonic spectrum. Also, the fundamental frequency defines the harmonic spacing; that is, when the fundamental frequency is 100 Hz, harmonics will be spaced at 100 Hz intervals (i.e., 100, 200, 300 ...), when the fundamental frequency is 125 Hz, harmonics will be spaced at 125 Hz intervals (i.e., 125, 250, 375...), and when the fundamental frequency is 200 Hz, harmonics will be spaced at 200 Hz
intervals (i.e., 200, 400, 600 ...). (For some special signals this will not be the case.2) So, when fo is low, harmonics will be closely spaced, and when fo is high, harmonics will be widely spaced. This is clearly seen in Figure 3-11: the signal with the lowest f0 (100 Hz, the middle signal) shows the narrowest harmonic spacing, while the signal with the highest f0 (400 Hz, the bottom signal) shows the widest harmonic spacing.
Figure 3-12. Time and frequency domain representations of three non-transient complex aperiodic signals. Unlike complex periodic signals, complex aperiodic signals show energy that is spread across the spectrum. This type of spectrum is called dense or continuous. These spectra have a very different appearance from the “picket fence” look that is associated with the discrete, harmonic spectra of complex periodic signals.
There are certain characteristics of the spectra of complex periodic sounds that can be determined by making simple measurements of the time domain signal, and there are certain other characteristics that require a more complex analysis. For example, simply by examining the signal in the bottom panel of Figure 3-11 we can determine that it is complex periodic (i.e., it is periodic but not sinusoidal) and therefore it will show a harmonic spectrum with energy at whole number multiples of the fundamental frequency. Further, by measuring the fundamental period (2.5 ms)
and converting it into fundamental frequency (400 Hz), we are able to determine that the signal will have energy at 4
Figure 3-13. Time and frequency domain representations of three transients. Transients are complex aperiodic signals that are defined by their brief duration. Pops, clicks, and the sound gun fire are examples of transients. In common with longer duration complex aperiodic signals, transients show dense or continuous spectra, very unlike the discrete, harmonic spectra associated with complex periodic sounds.
00, 800, 1200, 1600, etc. But how do we know the amplitude of each of these f requency components? And how do we know the phase of each component? The answer is that you cannot determine harmonic amplitudes or phases simply by inspecting the signal or by making simple measurements of the time domain signals with a ruler. We will see soon that a technique called Fourier analysis is able to determine both the amplitude spectrum and the phase spectrum of any signal. We will also see that the inner ears of humans and many other animals have developed a trick that is able to produce a neural representation that is comparable in some respects to an amplitude spectrum. We will also see that the ear has no comparable trick for deriving a representation that is equivalent to a phase spectrum. This explains why the amplitude spectrum is far more important for speech and hearing applications than the phase spectrum. We will return to this point later.
To summarize: (1) a complex periodic signal is any periodic signal that is not sinusoidal, (2) complex periodic signals have energy at the fundamental frequency (fo) and at whole number multiples of the fundamental frequency (2 fo, 3 fo, 4 fo ...), and (3) although measuring the fundamental frequency allows us to determine the frequency locations of harmonics, there is no simple measurement that can tell us harmonic amplitudes or phases. For this, Fourier analysis or some other spectrum analysis technique is needed.
Aperiodic Sounds An aperiodic sound is any sound that does not show a repeating pattern in its time domain representation. There are many aperiodic sounds in speech. Examples include the hissy sounds associated with fricatives such as /f/ and /s/, and the various hisses and pops associated with articulatory release for the stop consonants /b,d,g,p,t,k/. Examples of non-speech aperiodic sounds include a drummer's cymbal or snare drum, the hiss produced by a r Figure 3-14. Illustration of the principle underlying Fourier analysis. The complex periodic signal shown in panel e was derived by point-for-point summation of the sinusoidal signals shown in panels a-d. Point-for-point summation simply means beginning at time zero (i.e., the start of the signal) and adding the instantaneous amplitude of signal a to the instantaneous amplitude of signal b at time zero, then adding that sum to the instantaneous amplitude of signal c, also at time zero, then adding that sum to instantaneous amplitude of signal d at time zero. The sum of instantaneous amplitudes at time zero of signals a-d is the instantaneous amplitude of the composite signal e at time zero. For example, at time zero the amplitudes of sinusoids a-d are 0, +100, -200, and 0, respectively, producing a sum of -100. This agrees with the instantaneous amplitude at the very beginning of composite signal e. The same summation procedure is followed for all time points.
adiator, and static sound produced by a poorly tuned radio. There are two types of aperiodic sounds: (1) continuous aperiodic sounds (also known as noise) and (2) transients. Although there is no sharp cutoff, the distinction between continuous aperiodic sounds and transients is based on duration. Transients (also "pops" and "clicks") are defined by their very brief duration, and continuous aperiodic sounds are of longer duration. Figure 3-12 shows several examples of time domain representations and amplitude spectra for continuous aperiodic sounds. The lack of periodicity in the time domain is quite evident; that is, unlike the periodic sounds we have seen, there is no pattern that repeats itself over time.
All aperiodic sounds -- both continuous and transient -- are complex in the sense that they always consist of energy at more than one frequency. The characteristic feature of aperiodic sounds in the frequency domain is a dense or continuous spectrum, which stands in contrast to the harmonic spectrum that is associated with complex periodic sounds. In a harmonic spectrum, there is energy at the fundamental frequency, followed by a gap with little o Figure 3-15. A signal enters a Fourier analyzer in the time domain and exits in the frequency domain. As outputs, the Fourier analyzer produces two frequency-domain representations: an amplitude spectrum that shows the amplitude of each sinusoidal component that is present in the input signal, and a phase spectrum that shows the phase of each of the sinusoids. The input signal can be reconstructed perfectly by summing sinusoids at frequencies, amplitudes, and phase that are shown in the Fourier amplitude and phase spectra, using the summing method that is illustrated in Figure 3-14..
r no energy, followed by energy at the second harmonic, followed by another gap, and so on. The spectra of aperiodic sounds do not share this "picket fence" appearance. Instead, energy is smeared more-or-less continuously across the spectrum. The top panel in Figure 3-12 shows a specific type of continuous aperiodic sound called white noise. By analogy to white light, white noise has a flat amplitude spectrum; that is, approximately equal amplitude at all frequencies. The middle panel in Figure 3-12 shows the sound /s/, and the bottom panel shows sound /f/. Notice that the spectra for all three sounds are dense; that is, they do not show the "picket fence" look that reveals harmonic structure. As was the case for complex periodic sounds, there is no way to tell how much energy there will be at different frequencies by inspecting the time domain signal or by making any simple measures with a ruler. Likewise, there is no simple way to determine the phase spectrum. So, after inspecting a time-domain signal and determining that it is aperiodic, all we know for sure is that it will have a dense spectrum rather than a harmonic spectrum.
Figure 3-13 shows time domain representations and amplitude spectra for three transients. The transient in the top panel was produced by rapping on a wooden desk, the second is a single clap of the hands, and the third was produced by holding the mouth in position for the vowel /o/, and tapping the cheek with an index finger. Note the brief durations of the signals. Also, as with continuous aperiodic sounds, the spectra associated with transients are dense; that is, there is no evidence of harmonic organization. In speech, transients occur at the instant of articulatory release for stop consonants. There are also some languages, such as the South African languages Zulu, Hottentot, and Xhosa, that contain mouth clicks as part of their phonemic inventory (MacKay, 1986). Fourier Analysis
Fourier analysis is an extremely powerful tool that has widespread applications in nearly every major branch of physics and engineering. The method was developed by the 19th century mathematician Joseph Fourier, and although Fourier was studying thermal waves at the time, the technique can be applied to the frequency analysis of any kind of wave. Fourier's great insight was the discovery that all complex waves can be derived by adding
sinusoids together, so long as the sinusoids are of the appropriate frequencies, amplitudes, and phases. For example, the complex periodic signal at the bottom of Figure 3-14 can be derived by summing sinusoids at 100, 200, 300, and 400 Hz, with each sinusoidal component having the amplitude and phase that is shown in the figure (see the caption of Figure 3-14 for an explanation of what is meant by summing the sinusoidal components). The assumption that all complex waves can be derived by adding sinusoids together is called Fourier's theorem, and the analysis technique that Fourier developed from this theorem is called Fourier analysis. Fourier analysis is a mathematical technique that takes a time domain signal as its input and determines: (1) the amplitude of each sinusoidal component that is present in the input signal, and (2) the phase of each sinusoidal component that is present in the input signal. Another way of stating this is that Fourier analysis takes a time domain signal as its input and produces two frequency domain representations as output: (1) an amplitude spectrum, and (2) a phase spectrum.
The basic concept is illustrated in Figure 3-15, which shows a time domain signal entering the Fourier analyzer. Emerging at the output of the Fourier analyzer is an amplitude spectrum (a graph showing the amplitude of each sinusoid that is present in the input signal) and a phase spectrum (a graph showing the phase of each sinusoid that is present in the input signal). The amplitude spectrum tells us that the input signal contains: (1) 200 Hz sinusoid with an amplitude of 100 Pa, a 400 Hz sinusoid with an amplitude of 200 Pa, and a 600 Hz sinusoid with an amplitude of 50 Pa. Similarly, the phase spectrum tells us that the 200 Hz sinusoid has a phase of 90o, the 400 Hz sinusoid has a phase of 180o, and the 600 Hz sinusoid has a phase of 270o. If Fourier's theorem is correct, we should be able to reconstruct the input signal by summing sinusoids at 200, 400, and 600 Hz, using the amplitudes and phases that
are shown. In fact, summing these three sinusoids in this way would precisely reproduce the original time domain signal; that is, we would get back an exact replica of our original signal, and not just a rough approximation to it.
For our purposes it is not important to understand how Fourier analysis works. The most important point about Fourier's idea is that, visual appearances aside, all complex waves consist of sinusoids of varying frequencies, amplitudes, and phases. In fact, Fourier analysis applies not only to periodic signals such as those shown in Figure 3-15, but also to noise and transients. In fact, the amplitude spectra of the aperiodic signals shown in Figure 3-13 were calculated using Fourier analysis. In later chapters we will see that the auditory system is able to derive a neural representation that is roughly comparable to a Fourier amplitude spectrum. However, as was mentioned earlier, the auditory system does not derive a representation comparable to a Fourier phase spectrum. As a result, listeners are very sensitive to changes in the amplitude spectrum but are relatively insensitive to changes in phase.
Some Additional Terminology Overtones vs. Harmonics: The term overtone and the term harmonic refer to the same concept; they are just counted differently. As we have seen, in a harmonic series such as 100, 200, 300, 400, etc., the 100 Hz component can be referred to as either the fundamental frequency or the first harmonic; the 200 Hz component is the second harmonic, the 300 Hz component is the third harmonic, and so on. An alternative set of terminology would refer to the 100 Hz component as the fundamental frequency, the 200 Hz component as the first overtone, the 300 Hz component as the second overtone, and so on. Use of the term overtone tends to be favored by those interested in musical acoustics, while most other acousticians tend to use the term harmonic.
Octaves vs. Harmonics: An octaverefers to a doubling of frequency. So, if we begin at 100 Hz, the next octave up would 200 Hz, the next would be 400 Hz, the next would be 800 Hz, and so on. Note that this is quite different from a harmonic progression. A harmonic progression beginning at 300 Hz would be 300, 600, 900, 1200, 1500, etc., while an octave progression would be 300, 600, 1200, 2400, 4800, etc. There is something auditorilly natural about octave spacing, and octaves play a very important role in the organization of musical scales. For example, on a piano keyboard, middle A (A5) is 440 Hz, A above middle A (A6) is 880 Hz, A7 is 1,760 and so on. (See Box 3-2).
Wavelength: The concept of wavelengthis best illustrated with an example given by Small (1973). Small asks us to imagine dipping a finger repeatedly into a puddle of water at a perfectly regular interval. Each time the finger hits the water, a wave is propagated outward, and we would see a pattern formed consisting of a series of concentric circles (see Figure 3-16). Wavelength is simply the distance between the adjacent waves. Precisely the same concept can be applied to sound waves: wavelength is simply the distance between one compression wave and the next (or one rarefaction wave and the next or, more generally, the distance between any two corresponding points in adjacent waves). For our purposes, the most important point to be made about wavelength is that there is a simple relationship between frequency and wavelength. Using the puddle example, imagine that we begin by dipping our finger into the puddle at a very slow rate; that is, with a low "dipping frequency." Since the waves have a long period of time to travel from one dip to the next, the wavelength will be large. By the same reasoning, the wavelength becomes smaller as the "dipping frequency" is increased; that is, the time allowed for the wave to travel at high "dipping frequency" is small, so the wavelength is small. Wavelength is a measure of distance, and the formula for calculating wavelength is a straightforward algebraic rearrangement of the familiar "distance = rate time" formula from junior high school.
= c/f, where: wavelength
c = the speed of sound
f = frequency
By rearranging the formula, frequency can be calculated if wavelength and the speed of sound are known:
f = c/
S Figure 3-16. Wavelength is a measure of the distance between the crest of one cycle of a wave and the crest of the next cycle (or trough to trough or, in fact, the distance between any two corresponding points in the wave). Wavelength and frequency are related to one another. Because the wave has only a short time to travel from one cycle to the next, high frequencies produce short wavelengths. Conversely, because of the longer travel times, low frequencies produce long wavelengths.
pectrum Envelope: The term spectrum envelope refers to an imaginary smooth line drawn to enclose an amplitude spectrum. Figure 3-17 shows several examples. This is a rather simple concept that will play a very important role in understanding certain aspects of auditory perception. For example, we will see that our perception of a perceptual attribute called timbre (also called sound quality) is controlled primarily by the shape of the spectrum envelope, and not by the fine details of the amplitude spectrum. The examples in Figure 3-17 show how differences in spectrum envelope play a role in signaling differences in one specific example of timbre called
Figure 3-17. A spectrum envelope is an imaginary smooth line drawn to enclose an amplitude spectrum. Panels a and b show the spectra of two signals (the vowel /å/) with different fundamental frequencies (note the differences in harmonic spacing) but very similar spectrum envelopes. Panels c and d show the spectra of two signals with different spectrum envelopes (the vowels /i/ and /u/ in this case) but the same fundamental frequencies (i.e., the same harmonic spacing).
owel quality (i.e., whether a vowel sounds like /i/ vs. /a/ vs. /u/, etc.). For example, panels a and b in Figure 3-17 show the vowel /å/ produced at two different fundamental frequencies. (We know that the fundamental frequencies are different because one spectrum shows wide harmonic spacing and the other shows narrow harmonic spacing.) The fact that the two vowels are heard as /a/ despite the difference in fundamental frequency can be attributed to the fact that these two signals have similar spectrum envelopes. Panels c and d in Figure 3-17 show the spectra of two signals with different spectrum envelopes but the same fundamental frequency (i.e., with the same harmonic spacing). As we will see in the chapter on auditory perception, differences in fundamental frequency are perceived as differences in pitch. So, for signals (a) and (b) in Figure 3-17, the listener will hear the same vowel produced at two different pitches. Conversely, for signals (c) and (d) in Figure 3-17, the listener will hear two different vowels produced at the same pitch. We will return to the concept of spectrum envelope in the chapter on auditory perception.
Amplitude Envelope: The term amplitude envelope refers to an imaginary smooth line that is drawn on top of a time domain signal. Figure 3-18 shows sinusoids that are identical except for their amplitude envelopes. It can be seen that the different amplitude envelopes reflect differences in the way the sounds are turned on and off. For example, panel a shows a signal that is turned on abruptly and turned off abruptly; panel b shows a signal that is turned on gradually and turned off abruptly; and so on. Differences in amplitude envelope have an important effect on the quality of a sound. As we will see in the chapter on auditory perception, amplitude envelope, along with spectrum envelope discussed above, is another physical parameter that affects timbre or sound quality. For example, piano players know that a given note will sound different depending on whether or not the damping pedal is used. Similarly, notes played on a stringed instrument such as a violin or cello will sound different depending on whether the note is plucked or bowed. In both cases, the underlying acoustic difference is amplitude envelope.
A Figure 3-18. Amplitude envelope is an imaginary smooth line drawn to enclose a time-domain signal. This feature describes how a sound is turned on and turned off; for example, whether the sound is turned on abruptly and turned off abruptly (panel a), turned on gradually and turned off abruptly (panel b), turned on abruptly and turned off gradually (panel c), or turned on and off gradually (panel d).
coustic Filters As will be seen in subsequent chapters, acoustic filtering plays a central role in the processing of sound by the inner ear. The human vocal tract also serves as an acoustic filter that modifies and shapes the sounds that are created by the larynx and other articulators. For this reason, it is quite important to understand how acoustic filters work. In the most general sense, the term filter refers to a device or system that is selective about the kinds of things that are allowed to pass through versus the kinds of things that are blocked. An oil filter, for example, is designed to allow oil to pass through while blocking particles of dirt. Of special interest to speech and hearing science are frequency selective filters. These are devices that allow some frequencies to pass through while blocking or attenuating other frequencies. (The term attenuate means to weaken or reduce in amplitude).
A simple example of a frequency selective filter from the world of optics is a pair of tinted sunglasses. A piece of white paper that is viewed through red tinted sunglasses will appear red. Since the original piece of paper is white, and since we know that white light consists of all of the visible optical frequencies mixed in equal amounts, the reason that the paper appears red through the red tinted glasses is that optical frequencies other than those corresponding to red are being blocked or attenuated by the optical filter. As a result, it is primarily the red light that is being allowed to pass through. (Starting at the lowest optical frequency and going to the highest, light will appear red, orange, yellow, green, blue, indigo, and violet.)
A graph called a frequency response curve is used to describe how a frequency selective filter will behave. A frequency response curve is a graph showing how energy at different frequencies will be affected by the filter. Specifically, a frequency response curve plots a variable called "gain" as a function of variations in the frequency of the input signal. Gain is the amount of amplification provided by the filter at different signal frequencies. Gains are interpreted as amplitude multipliers; for example, suppose that the gain of a filter at 100 Hz is 1.3. If a 100 Hz sinusoid enters the filter measuring 10 uPa, the amplitude at the output of the filter at 100 Hz will measure 13 Pa
(10 Pa x 1.3 = 13 Pa). The only catch in this scheme is that gains can and very frequently are less than 1, meaning that the effect of the filter will be to attenuate the signal. For example, if the gain at 100 Hz is 0.5, a 10 Pa input signal at 100 Hz will measure 5 Pa at the output of the filter. When the filter gain is 1.0, the signal is unaffected by the filter; i.e., a Pa input signal will measure 10 Pa at the output of the filter.