Pitch
Pitch
Pitch is that quality of sound which allows us to play musical melodies. Naive introductory neuroscience texts tend to equate pitch with frequency. However, most real world sounds contain many frequency components, but have either only one clear pitch, or no pitch at all. Chapter 3 of "Auditory Neuroscience" describes the characteristics of sound that determine pitch, outlines how musical pitch is treated in western classical music, and describes how the brain is thought to extract physical cues to the periodicity of a sound to create the subjective percept of pitch. The following collection of web pages provide supplementary materials to accompany that chapter.
Melodies and Timbre
Melodies and Timbre
Many different sounds have the same pitch - you can play the same melody with a flute or with a clarinet or with a horn. Here, you can hear the same melody played on three computer-generated instruments.
The melodies in these examples share the same sequence of pitches, and have about the same sound level. The property of sounds that is different between them is called 'timbre' - the timbre of the flute is different from that of the clarinet and from that of the horn.
Here are the spectra of the three versions:
Flute
Oboe
Horn
Note the large differences between the spectra, as well as the similarities. The sequence of pitches is to a large extent apparent in the line of the fundamental (the lowest red band). However, the number and relative strengths of the higher harmonics vary substantially between the three instruments, resulting in their unique timbre.
How many repetitions are required to produce pitch?
How many repetitions are required to produce pitch?
The main determinant of pitch is sound periodicity. A sound is periodic when it is composed of consecutive repetitions of a single short segment (the 'period'). The following figure (fig. 3.1 "Auditory Neuroscience") shows examples of periodic sounds:
Only a small number of repetitions of a period are required to generate the perception of pitch. The following sounds are all composed of the same period, which repeats itself a different number of times in each example. Each such sound is repeated a few times.
A single repeat results in no pitch sensation at all - there is nothing periodic:
With eight repeats, a clear pitch is heard:
Find out yourself what is the minimal number of repeats that is necessary in order to hear pitch!
2 Repeats:
3 Repeats:
4 Repeats:
6 Repeats:
8 Repeats:
Pitch matching
Pitch matching
Pitch is defined by its perceptual qualities, and therefore has to be determined by the judgment of human listeners. By convention, we use the pitch evoked by pure tones as a yardstick with respect to which we judge the pitch evoked by other sounds. In practical terms, this is performed by matching experiments: A periodic sound whose pitch we want to measure is presented alternatively with a pure tone. Listeners are asked to change the frequency of the pure tone until it evokes the same pitch as the periodic sound. The frequency of the matching pure tone then serves as a quantitative measure of the pitch of the tested periodic sound. In such experiments, subjects most often set the pure tone so that its period is equal to the period of the test sound. In the demonstrations below, the sound to be tested is played four times alternately with a pure tone. The test sound is the same at all repetitions, but the period of the pure tone changes from repetition to repetition: it is the same as the period of the test sound in the first repetition, shorter(higher pitch) in the second, longer (lower pitch) in the third, and is again the same as that of the test sound in the last repetition.
Here is the demonstration, with the test sound being a cosine-phase harmonic complex at 400 Hz:
Here is the demonstration, with the test sound being an Iterated Repeated Noise (IRN) with 4 iterations, again at 400 Hz:
Now you are ready to try pitch matching by yourself. In the following gadget, you can select the type of pitch-evoking sound and try to match it by moving the slider.
Click "Start" to hear a pure tone and a complex sound, played in alternation. Adjust the horizontal slider to change the frequency of the tone until it matches that of the complex sound.
The range of periods that evoke pitch
The range of periods that evoke pitch
Regular click trains at a rate of less than about 40 Hz sound like individual regular events, perhaps a bit like machine-gun fire. Click trains with rates faster than about 40 Hz merge into a continuous "buzz", where the pitch of the buzz depends on the click rate: the faster the rate, the higher the pitch.
The two sound examples below illustrates this. The first example consists of a click train, where the rate of the clicks doubles every 3 seconds. During the first 3 seconds, the rate of the clicks is about 10.7 Hz, then it increases to 21.4 Hz, 43 Hz, 86 Hz, 172 Hz, 344 Hz, 689 Hz, 1378 Hz, 2756 Hz, and 5512 Hz. Clear pitch emerges between 43 Hz and 86 Hz, although at 86 Hz there is still 'flutter', and full smoothness of the resulting percept occurs only at rates of a few hundreds Hz.
The second example consists of clicks which are presented initially at a rate of about 10 Hz. The rate then continuously speeds up, to a rate of 10,000 Hz, and then slows down again. At the beginning and the end you will hear a rapid train of individual clicks, but in the middle you hear a buzz of rapidly rising, then falling, pitch. The sequence is illustrated in the figure below.
Vowels are not strictly periodic
Vowels are not strictly periodic
One of the most important classes of sounds that have pitch in the natural environment are voiced speech sounds. However, like many other naturally-produced sounds, these sounds are not strictly periodic. In spite of this, they produce a strong sense of pitch. Sounds that are not strictly periodic but that do evoke pitch are in fact the rule, rather than the exception.
Here is a naturally-produced human vowel:
Elliot and Theunissen addressed this question by calculating the "modulation spectra" of speech as shown here:
The spectrogram of the vowel is presented on the left below, while on the right three consecutive periods are shown superimposed. Each of the periods has a length of about 6.75 ms - the pitch here is about 150 Hz. The periods have been extracted from around time 50 ms in the spectrogram. Note the small, but continuously present, changes in the sound, which can be observed both by following it period by period (right) and at the longer time scale, of 100s of ms, which can be followed in the spectrogram.
These micromodulations are crucial for the natural quality of the sound. When the micromodulations are removed, making the sound strictly periodic, it becomes buzzy and artificial, losing much of its speech quality:
Additional information about modulations in speech can be found in 'speech as a "modulated signal"'.
Missing fundamentals. Periodicity and Pitch
Missing fundamentals. Periodicity and Pitch
As explained in the section on modes of vibration. most natural sound sources will not emit pure tones, but sounds composed of many, often harmonically related frequencies. Now, some texts on hearing will tell you that the pitch of a sound is "related to the sound's frequency", but if a sound contains many (possibly harmonically related) frequencies then it may not be at all obvious which of the sound's frequencies determines the pitch.
Take the following example. Here we have a simple melody played in pure tones.
Pure tones
And here we have the same melody played using "complex" tones containing sine waves with the same fundamental frequency as well as 9 additional "higher harmonics" (multiples of the fundamental).
Harmonics 1-10
Spectrograms of these sounds are shown in the picture below.
Now it should be obvious that the melodies (and hence the pitches) are the same, even if the overall frequency content, and therefore the "timbre", of these sounds is different. These two sounds do however share the same fundamental frequency: the second souns is literally the sum of the first one plus the contributions of the higher harmonics. So it may be tempting to think that it is the presence of the fundamental frequency that determines the pitch.
The curious thing is that you can take this fundamental frequency out, leaving only the higher harmonics, and the pitch still remains the same. This is what we have done in the following sound (in fact, for good measure, we took out not just the fundamental but the next two harmonics too, so that only harmonics 4 to 10 are left; when listening, turn down the volume to reduce the harmonic distortions produced by the speakers, which would reintroduce the lower harmonics!).
Harmonics 4-10
The first melody and the third melody have no frequency components in common. You can verify that on the spectograms shown here below to the left. Nevertheless they are 'the same' in that they have the same sequence of pitches.
"Missing fundamental" stimuli, like the third melody, argue strongly against a simple frequency place code in the cochlea as the main cue for perceived pitch. A more reliable cue is the sound's periodicity. The panels on the right in the figure above zoom in on a 10 ms long sound snippet during the first note for each of the melodies here. This note has a fundamental of 300 Hz, so there are three whole cycles in 10 ms. This 300 Hz periodicity (identical wave form patterns repeating themselves at a rate of 300 such patterns per second) is something that the first note in all three of the examples here clearly have in common, even though the corresponding sound in the third melody does not contain a 300 Hz (Fourier) frequency component. It has the right periodicity because all its frequency components have a 300 Hz periodicity, but they do not share any longer period.
Why Missing Fundamental Stimuli are Counterintuitive
Why Missing Fundamental Stimuli are Counterintuitive
The fact that tone complexes with missing fundamentals can be perceived to have a pitch that is below their lowest frequency component can have counterintuitive consequences.
Consider the tone sequence shown in the spectrogram here:
A 440 Hz pure tone (labelled "A") alternates with a tone complex containing frequencies 500, 750, 100, 1250 and 1500 Hz (labelled "B"). All frequencies in B are above the frequencies in A, but nevertheless when you listen to these stimuli you may find that B sounds lower because the pitch of B is perceived to be that of a missing 250 Hz fundamental.
Note also that most people find it quite hard to judge what the relative pitches of the A and B sounds really are because their timbre is so different. That just highlights that pitch is complicated, and that it would be naive to think that it is a simple mapping of tonotopic, cochlear place coding to perception.
The panel on the right shows the waveforms of the two sounds, with the "A" sound in blue and the "B" sound in red. Note that the A sound has a pattern that repeats every 1000/440=2.27 ms, as you might expect, but the B sound has has a pattern that repeats only every 4 ms because its harmonics beat at the fundamental. Thus, the A sound "cycles faster" than the B sound, and "timing theories of pitch" would predict that it should therefore have a higher pitch.
Or here is another example:
The sounds in this spectrogram are harmonic complexes with a fundamental frequency that rises from 110 to 220 Hz in semitone steps. So the pitch should be rising from the musical notes A2 to A3. But the harmonic complexes have been bandpass filtered to be 3.5 octaves wide with the lower edge of their passband falling from 880 Hz to 440 Hz. So the frequency components become increasingly lower, but the pitch should get higher, at least in as far as harmonic structure is the dominant pitch cue. Do the pitches sound falling or rising to you?
Non-Periodic Sounds That Evoke Pitch
Non-Periodic Sounds That Evoke Pitch
Here are examples of three sounds that evoke pitch without being strictly periodic. A detailed discussion of these sounds can be found in the pitch chapter of the book.
Each of these sounds is approximately periodic, and their spectra have an approximately periodic structure, reminiscent of the strictly harmonic structure of periodic sounds. In each case, there is a clear period. If the sound is shifted by that period, it best resembles its unshifted version.
Harmonic complex with noise
Iterated repeated noise (IRN)
AABB noise
Periodic Sounds Must Have Harmonic Structure
Periodic Sounds Must Have Harmonic Structure
Periodic sounds (sounds with waveforms that have a repeated "motiv", as in the blue trace shown above) will have Fourier spectra which always must consist solely of "harmonics" of the sound. Harmonics are sine waves with periods that are integer multiples of some fundamental period. The red lines above are "cosine phase" harmonics of the blue line. When thinking about Fourier spectra, we want to imagine the blue line being made up of a sum of lots of sine waves like the red and green lines, where we might adjust the phase and amplitude of the sine waves as required. The important thing to note here is that, no matter how we would adjust the phase and amplitude of the green line, it could never be part of the mixture needed to make up the blue line. The reason is this: compare the values that the waveforms have at identical points in the period, for example by comparing the points marked by the stippled gray lines. The red lines will always contribute the same values at each cycle (here for example they are always maximal at the periods marked out by the gray lines). In contrast, the green line does not "fit" an integer number of cycles into the fundamental period, and the contribution it would make to each cycle of the wave would therefore be different, which would destroy the periodicity of the wave. The green line can therefore not be a Fourier component of the periodic blue sound wave. Nor can any other sine wave that has a period which is not a harmonic of the fundamental period of the sound.
Pitch of 3-component Harmonic Complexes
Pitch of 3-component Harmonic Complexes
Harmonic complexes composed of 3 consecutive harmonics are among the simplest periodic sounds. Their periodicity is determined by the spacing between the harmonics. Here is such a complex, composed of harmonics 1 (the fundamental), 2 and 3 of 100 Hz. The top panel shows the spectrum of this sound, and the bottom panel shows a 30 ms long segment of the waveform, consisting of three periods (100 Hz corresponds to a period of 10 ms). The pitch of this sound is very obvious:
These complexes, when built of harmonics of very high harmonic number, still have the same periodicity. However, their pitch is not at their periodicity anymore. Here is a complex composed of harmonics 21, 22 and 23 of 100 Hz (note: you should lower the volume of the computer loudspeakers to avoid generating harmonic distortions that would regenerate the fundamental!):
How high can the harmonic numbers of the components be for a periodicity pitch to appear? Note that this demonstration stretches the capabilities of poor-quality computer speakers. These have hard time reproducing sounds with frequencies below a few hundreds Hz, but generate serious harmonic distortions at frequencies of a few thousands Hz. Thus, to hear the examples with low harmonic numbers you will need to use a high sound volume, while to avoid regenerating the fundamental by harmonic distortions in the examples with high harmonic numbers you will need to use low sound volume.
Harmonics 1 to 3
Harmonics 2 to 4
Harmonics 3 to 5
Harmonics 4 to 6
Harmonics 5 to 7
Harmonics 7 to 9
Harmonics 9 to 11
Harmonics 11 to 13
Harmonics 13 to 15
Harmonics 17 to 19
Harmonics 21 to 23
Periodicity of Sounds and of Envelopes
Periodicity of Sounds and of Envelopes
Pitch is determined in most cases by the periodicity of the sound waveform. However, some sounds have other, more subtle periodicities. In some cases, these periodicities may determine the pitch, but in other cases they don't. Here such subtle periodicity is illustrated - the periodicity of the envelope.
Look at the figure below. This sound (blue) was generated as a sum of many harmonics in 'cosine-phase' - this is a complicated way to say that all harmonics peak together at the beginning of the pitch period. The resulting waveform is very peaky, but has fast fluctuations which are smallest at the midpoint between two peaks and which increase in size around the peaks. One could imaging an 'envelope' - a positive waveform that would measure the overall energy of the sound waveform at each moment in time. The envelope (here computed using the 'Hilbert transform' - this is a side issue here) is plotted in gray.
Sound
Envelope
Now, observe the figure below. This sound was generated with the same harmonics, except that only every other harmonic is in cosine-phase. The other half are in 'sine-phase' - this means that instead of peaking at the beginning of the pitch period, they have an upward zero crossing there (remember, harmonics are sine waves!). In contrast with the previous sound, this one has two 'events', with similar shape but opposite polarity, during the pitch period. Its periodicity is nevertheless exactly that of the previous sound, and the evoked pitch is the same. However, the envelope, which measures the overall energy at each moment in time, has now two peaks within each pitch period, and in fact, when using the same method for computing it, has half the period of the waveform. Thus, it has a pitch which is twice as high (200 Hz instead of 100 Hz in this case).
Sound
Envelope
This is important when we study the responses of neurons to periodic sounds. Neurons may well respond to the envelope, rather than to the sound itself. Such neurons cannot encode pitch, even when they are sensitive to periodicity, because they do not give the right answer for alternate-phase harmonic complexes.
Single Formant Vowel with Changing Pitch
Single Formant Vowel with Changing Pitch
This sound illustrates one of the sounds used in the study of Cariani and Delgutte (1996) on the coding of pitch in auditory nerve fibers. It is a so-called single-formant vowel, since its spectral envelope has a single peak in frequency (vowels have multiple such 'formants' - see Chapter 4). See Fig. 3-9 in the book.
Here are two consecutive periods, one pair taken from the beginning (green) and one pair from the middle (orange) of the sound:
The blue arrows span one period of the sound (you can see how similar are the two consecutive periods). In the middle of the sound, the periods are about half as long as in the beginning of the sound.
The gray bars indicate one cycle of the 'fine structure' of each period, which is determined by the formant frequency. In contrast with the pitch periods, they are essentially equal to each other.
Thus, the sound has a pitch that changes over about one octave (it is twice as high in the middle as in the beginning and end of the sound), but its formant frequency remains fixed. In consequence, its timbre remains the same throughout its duration.
Fundamental frequencies of Notes in Western Music
Fundamental frequencies of Notes in Western Music
Chapter 3 of Auditory Neuroscience discusses the pitch intervals used western music in great detail. For convenience, a table of fundamental frequencies for equal-tempered scale is copied below from http://www.phy.mtu.edu/~suits/notefreqs.html.
By convention A4 = 440 Hz
Notes are separated by "semitone" intervals. There are 12 seimtones in each octave, and fundamental frequencies are logarithmically spaced, so the each note fundamental frequency is 2(1/12) = 1.0595 times the previous frequency.
The wavelength values assume a speed of sound = 345 m/s
("Middle C" is C4 )
Note | Frequency (Hz) | Wavelength (cm) |
---|---|---|
C0 | 16.35 | 2109.89 |
C#0/Db0 | 17.32 | 1991.47 |
D0 | 18.35 | 1879.69 |
D#0/Eb0 | 19.45 | 1770. |
E0 | 20.60 | 1670. |
F0 | 21.83 | 1580. |
F#0/Gb0 | 23.12 | 1490. |
G0 | 24.50 | 1400. |
G#0/Ab0 | 25.96 | 1320. |
A0 | 27.50 | 1250. |
A#0/Bb0 | 29.14 | 1180. |
B0 | 30.87 | 1110. |
C1 | 32.70 | 1050. |
C#1/Db1 | 34.65 | 996. |
D1 | 36.71 | 940. |
D#1/Eb1 | 38.89 | 887. |
E1 | 41.20 | 837. |
F1 | 43.65 | 790. |
F#1/Gb1 | 46.25 | 746. |
G1 | 49.00 | 704. |
G#1/Ab1 | 51.91 | 665. |
A1 | 55.00 | 627. |
A#1/Bb1 | 58.27 | 592. |
B1 | 61.74 | 559. |
C2 | 65.41 | 527. |
C#2/Db2 | 69.30 | 498. |
D2 | 73.42 | 470. |
D#2/Eb2 | 77.78 | 444. |
E2 | 82.41 | 419. |
F2 | 87.31 | 395. |
F#2/Gb2 | 92.50 | 373. |
G2 | 98.00 | 352. |
G#2/Ab2 | 103.83 | 332. |
A2 | 110.00 | 314. |
A#2/Bb2 | 116.54 | 296. |
B2 | 123.47 | 279. |
C3 | 130.81 | 264. |
C#3/Db3 | 138.59 | 249. |
D3 | 146.83 | 235. |
D#3/Eb3 | 155.56 | 222. |
E3 | 164.81 | 209. |
F3 | 174.61 | 198. |
F#3/Gb3 | 185.00 | 186. |
G3 | 196.00 | 176. |
G#3/Ab3 | 207.65 | 166. |
A3 | 220.00 | 157. |
A#3/Bb3 | 233.08 | 148. |
B3 | 246.94 | 140. |
C4 | 261.63 | 132. |
C#4/Db4 | 277.18 | 124. |
D4 | 293.66 | 117. |
D#4/Eb4 | 311.13 | 111. |
E4 | 329.63 | 105. |
F4 | 349.23 | 98.8 |
F#4/Gb4 | 369.99 | 93.2 |
G4 | 392.00 | 88.0 |
G#4/Ab4 | 415.30 | 83.1 |
A4 | 440.00 | 78.4 |
A#4/Bb4 | 466.16 | 74.0 |
B4 | 493.88 | 69.9 |
C5 | 523.25 | 65.9 |
C#5/Db5 | 554.37 | 62.2 |
D5 | 587.33 | 58.7 |
D#5/Eb5 | 622.25 | 55.4 |
E5 | 659.26 | 52.3 |
F5 | 698.46 | 49.4 |
F#5/Gb5 | 739.99 | 46.6 |
G5 | 783.99 | 44.0 |
G#5/Ab5 | 830.61 | 41.5 |
A5 | 880.00 | 39.2 |
A#5/Bb5 | 932.33 | 37.0 |
B5 | 987.77 | 34.9 |
C6 | 1046.50 | 33.0 |
C#6/Db6 | 1108.73 | 31.1 |
D6 | 1174.66 | 29.4 |
D#6/Eb6 | 1244.51 | 27.7 |
E6 | 1318.51 | 26.2 |
F6 | 1396.91 | 24.7 |
F#6/Gb6 | 1479.98 | 23.3 |
G6 | 1567.98 | 22.0 |
G#6/Ab6 | 1661.22 | 20.8 |
A6 | 1760.00 | 19.6 |
A#6/Bb6 | 1864.66 | 18.5 |
B6 | 1975.53 | 17.5 |
C7 | 2093.00 | 16.5 |
C#7/Db7 | 2217.46 | 15.6 |
D7 | 2349.32 | 14.7 |
D#7/Eb7 | 2489.02 | 13.9 |
E7 | 2637.02 | 13.1 |
F7 | 2793.83 | 12.3 |
F#7/Gb7 | 2959.96 | 11.7 |
G7 | 3135.96 | 11.0 |
G#7/Ab7 | 3322.44 | 10.4 |
A7 | 3520.00 | 9.8 |
A#7/Bb7 | 3729.31 | 9.3 |
B7 | 3951.07 | 8.7 |
C8 | 4186.01 | 8.2 |
C#8/Db8 | 4434.92 | 7.8 |
D8 | 4698.64 | 7.3 |
D#8/Eb8 | 4978.03 | 6.9 |
Context Dependence of Pitch: Shepard Tone Hysteresis
Context Dependence of Pitch: Shepard Tone Hysteresis
"Shepard" tones are superpositions of pure tone components, in which the Nth tone has a frequency one octave above the (N-1)th tone (unlike you typical complex tone or harmonic complex, where the Nth tone is (N-1) octaves above a lowest common "fundamental" frequency.)
Shepard tones are interesting in that their pitch can be somewhat ambiguous, and this ambiguity can lead to curious perceptual phenomena. One phenomenon is that Shepard tones can be used to create continuous sounds known as "Shepard-Risset glissandos" which appear to be either continuously rising or falling in pitch. Another phenomenon is that the perceived pitch of Shepard tones can depend on the context in which they are presented. Here we present one demonstration of this context dependent pitch ambiguity that was studied in this paper by the group of Daniel Pressnitzer.
First consider this pair of Shepard tones. The components of the second complex are exactly half way between the components of the first complex. One could therefore perceive the two tone sequence either as a rising or a falling pitch step. There is no single "correct" answer. You may hear this as "clearly rising", while others may hear it as "falling" in pitch, and others may just be uncertain about the pitch change direction.
Listen to the tone pair a few times and make up your mind about what you think it sounds like before you move on:
Now consider two sequences, each consisting of six Shepard tone pairs. One distinguishing feature of the sequence labelled "Falling" from the one labelled "Rising" is that the first pair of the Falling sequence could be seen as either a small down step or a very large up step, and the reverse is true for the first pair of the Rising sequence. It appears that our brains will go with the "smaller step" interpretation, and pretty much all listeners will hear the first pair in the Failling sequence as being a small down step in pitch. The Falling sequence then continues with more pairs in which the downward pitch step becomes gradually larger. Similarly, the Rising sequence comprises increasinglingly larger up steps in pitch.
Now here comes the punchline: both sequences end in a pair in which the pitch step is exactly half an octave, and which is therefore intrinsically ambiguous. The last (6th) pair in both sequences is identical, and it is exactly the same tone pair that you have already encountered in isolation above. However, depending on whether this pair is presented in a context of either a falling or a rising sequence of pitch steps, the brain hears this sound pair very differently, as either "clearly falling" or "clearly rising" in pitch.
Falling:
Rising:
The idea that the same sounds can evoke very different pitches depending on the context in which they are encountered is challenging for people who, perhaps through musical training or otherwise, have been brought up to think that the perceived pitch of a sound should conform in an "objective" manner to some physical properties of the sound or the sound source, such as a fundamental resonance frequency. In reality, the pitch of a sound, perhaps not unlike the color of an object, is a subjective perceptual quality which is created by our brains, and while it is informed by the physical attributes of a given sound, it is not wholly determined by it.
Lecture: Cortical Representations of Complex Sounds
Lecture: Cortical Representations of Complex Sounds
This video clip shows a presentation on the Cortical Representation of Complex Sounds given by Jan Schnupp at a symposium of the British Neuroscience Association meeting in Harrogate on April 18th 2011.
[swf file="http://howyourbrainworks.net/jan/JanBNApitchTalk.flv"]