This short document describes a simple model of the vocal tract and the production of voiced speech used in the production of some sustained phonemes - especially the vowels. It also includes some brief notes about helium speech.
Note that the detail in the spectrum is easier to see if F0 is low, e.g. for a low pitched man's voice (diagram at left), than it is for a child's or woman's voice - shown at right.
The lowest resonance is determined to a considerable extent by the end effect of your mouth: if you lower your jaw, R1 rises. R2 is affected by the jaw position too, but it is primarily affected by the position of the constriction inside your mouth. Moving your tongue forwards and backwards changes R2 (and also R1, but to a lesser extent). A map of (R1,R2) for Australian English is given on our speech research page.
Nearly all information in speech is in the range 200 Hz-8 kHz. (The telephone carries only 300 Hz - 3 kHz but speech is reasonably intelligible and the telephone company's hold music still sounds okay.) The pitch is determined by the spacing of harmonics as much as or more than by the fundamental. Thus you can tell the pitch of a man's voice on the phone even though the fundamental of that signal is not present. Note the size of the vocal tract (~170 mm long) gives resonances > ~ 500 Hz. In fact a closed tube of this length is a functional approximation of the tract for the vowel "er" as in "herd". For this 'neutral' vowel, the first five resonances of the author's vocal tract are indeed at values of about 500, 1500, 2500, 3500 and 4500 Hz.
Warnings: He is suffocating and conducts heat well. After one inhalation of He, breathe air normally for a few minutes. In a gas cylinder, He is under high pressure. Do not inhale directly from a gas cylinder. Fill a toy balloon and inhale from that.
Okay, having read those warnings, you might not want to try. So I've put the recordings of my experiment below.

The first diagram shows a schematic picture of the spectrum (power vs frequency) for the sound of the voice made with a particular configuration of the vocal tract filled with air. The solid line is the spectral envelope; the vertical lines are the harmonics of the vibration of the vocal folds. The second diagram shows the effect of replacing air with helium, but keeping the tract configuration the same (i.e. trying to pronounce the same vowel as before, but with a throat full of helium). The speed of sound is greater, so the resonances occur at higher frequencies, as do the formants they produce: the second formant has now been shifted right off scale in this diagram. The flesh in your vocal folds still vibrates at the same* frequency, however, so the harmonics occur at the same frequency.
What does this sound like? Obviously the helium makes a big difference to the sound of the voice.
If you do the experiment with someone who has a bit of experience with singing or music, (and if s/he doesn't laugh too much on hearing helium voice) then the pitch will be the same in the two cases. The pitch is determined by the frequencies of the harmonics and these have not changed*. The speech does however sound 'like Donald Duck'. There is less power at low frequencies so the sound is thin and squeaky. This alteration to the timbre changes vowels in a spectacular way. Although we can understand whole sentences (using contextual clues) we find that individual vowels are very difficult to identify. (By the way, an articulate but otherwise standard duck would have a shorter vocal tract than ours so, even while breathing air, Donald would have resonances at rather higher frequencies than ours.)
* If you keep the muscle tensions the same, that is, the frequencies will not change much. There could be a small change because the less dense He loads the vocal folds a bit less than the air, but this effect is slight. The effect on the resonances is large, however. Its size depends on how pure the He in your vocal tract is.)
Gear for further investigations:
A microphone and oscilloscope with a sensitive input range (~ mV) or else a pre-
amplifier.
Appropriate
connectors. To start, try 100 ms/div on the time base, then look more closely. If the CRO is digital (or a virtual one running on your PC), the
storage mode is very useful.
A PC with a sound card and analysis/edit software is useful. The
sampling
feature is
effectively a storage CRO, and the analysis feature is effectively a spectrum
analyser.
You can put your fingers on your throat to determine whether vocal fold vibrate or not
('voiced' or
not).
Joe Wolfe / J.Wolfe@unsw.edu.au / 61-2-9385 4954 (UT + 10, +11 Oct-Mar)
Musical Acoustics Group
School of Physics, UNSW