Vocal resonances and broad band excitation
This site introduces a technique developed in this lab for measuring some important acoustic properties of the vocal tract non-invasively, in
real-time, while the owner of the vocal tract is speaking or singing. We use it as a research tool, but we have demonstrated its use as a speech trainer. You may wish to read Introduction to vocal tract acoustics before continuing.
Existing technologies used in speech pathology and speech trainers to provide visual feedback from the speech sound are inherently limited in precision and practicality. Even the most advanced speech recognition systems still mistake words, which indicates the limits of their precision in accurate measures of pronunciation. The basic problem is that the speech signal alone does not have enough information in it to allow
us to work out, quickly and precisely, the configuration of the vocal tract. This is not a problem for understanding speech, but it may be a problem in learning precise pronunciation. Our approach is therefore to introduce a signal with more information in the frequency domain.
Our technology is called Real-time Acoustic response by Vocal tract Excitation or RAVE.
In model experiments using the laboratory prototype, we have shown that one or two hours' training using visual feedback of some key features of the acoustical response of a subject's
vocal tract improves the accuracy and intelligibility of pronunciation
of foreign phonemes by monolingual adults.
How it works:
We inject into the vocal tract an acoustic current which is
synthesised to give high resolution frequency information over
the frequency range of interest. We then measure the impedance
of the vocal tract in parallel with the external field using
the response to this excitation signal.
In this figure, the author pronounces the vowel in 'heard'.
The sharp vertical peaks are the harmonics of my voice. The
broad signal shows the response of my vocal tract to the acoustic
curent signal being injected from the lips.
For this vowel, my vocal tract behaves rather
like a cylinder about 170 mm long, nearly closed at the vocal
folds and open at the mouth. A cylinder, length L, closed
at one end has resonances at f0 = v/4L , at 3f0,
5f0 etc, where v is the speed of sound. (See pipes
and harmonics.) So we see resonances at about 0.5, 1.5,
2.5, 3.5 and 4.5 kHz, which appear as the peaks in the
smooth curve in this figure. When I pronounce the vowel in
"had", I open my mouth wider, so the tract is no longer cylindrical,
but flared at the open end, a bit like the flare and bell
on a brass instrument. One of the effects
of a this shape in a brass instrument is to raise the
frequencies of the resonances, especially those of the lower
resonances. (In a related example, conical pipes have resonances
at higher frequencies than do cylindrical ones. See this
link for an explanation.)
From this response we can readily determine the resonances
of the vocal tract, independently of the speech signal. The
resonant frequencies are interesting for fundamental acoustical
phonetic research but, if we extract them in real time, they
can be used to drive a cursor for speech training. This is
how we do it in the real time version.
Schematic diagram. (a) shows the spectrum of the
speech signal alone. This male voice has harmonic partials
spaced at the pitch frequency 126 Hz. (b) The injected
signal has frequencies spaced at 5 Hz, whose amplitudes are
calibrated (in this case) using the radiation field outside
the speake's mouth. (c) The sum of the speech signal
and the broad band signal (including the effects of the resonances)
goes from the microphone to the ADC. The speech signal is
used to measure pitch and amplitude; then the harmonic components
below 1 kHz are removed. (d) The resonances are detected
from the remaining interpolated signal. Similarly, the broadband
signals may be removed to leave just the speech harmonics.
In the real-time version of the device used for speech training,
the resonance frequencies are used to position the cursor
on the vowel plane (see below). Notice that the signal:noise
ratio in these figures is greater than in the preceding figure.
This is a consequence of making the measurements rapidly.
How it looks:
This is a screen dump of the feedback display in the current
speech trainer device, set up with targets from Australian
English. The background ellipses are measurements of the vowels
of 33 Australian men, with mean values for each vowel at the
centre of each ellipse. The semi-axes are the standard deviations
in R1 and R2. These or other areas can be used as targets
in speech training. A cursor on the monitor (the cross at
(1190,530)) shows the current configuration of the subject's
own vocal tract. Initially, subjects 'steer' the motion of
the cursor by consciously controlling jaw and tongue position.
Speakers of the language displayed can 'aim' towards one of
the vowels shown. After some practice, however, it becomes
nearly as automatic as using a joy-stick or a mouse - one
just 'makes it go' where one wants, without thinking of the
muscular details. In other words, a visual feedback loop is
unconsciously used to train articulation.
Does it work?
For a report of a trial experiment using a prototype system
as a language trainer, see our papers:
- Dowd, A., Smith, J.R. and Wolfe, J. (1998) "Learning
to pronounce vowel sounds in a foreign language using acoustic
measurements of the vocal tract as feedback in real time"
Language and Speech, 41, 1-20.
- Epps, J., Smith, J.R. and Wolfe, J. (1997) "A novel instrument
to measure acoustic resonances of the vocal tract during
speech" Measurement Science and Technology 8,
- Donaldson, T., Wang, D., Smith, J. and Wolfe, J. (2003)
tract resonances: a preliminary study of sex differences
for young Australians", Acoustics Australia,
- J., Dowd, A., Smith, J.R. and Wolfe, J. (1997) Real
time measurements of the vocal tract resonances during speech
Eurospeech'97 (G. Kokkinakis, N. Fakotakis &
E. Dermatas, eds.) Rhodes, 721-724.
- Joliveau, E., Smith, J. and Wolfe, J. (2004) "Tuning
of vocal tract resonances by sopranos", Nature,
- Joliveau, E., Smith, J. and Wolfe, J. (2004) "Vocal
tract resonances in singing: the soprano voice", J.
Acoust. Soc. America, 116, 2434-2439.
More pages on related topics