EECS20N: Signals and Systems

Parametric Speech Synthesis

The technology of speech processing, which includes speech modeling, synthesis, encoding, and recognition, dates back to the parametric techniques introduced by Homer Dudley in the late 1930's and early 1940's. These methods are "parametric" in the sense that they construct a model of the acoustic properties of the human vocal tract, and then analyze speech by determining the values of the parameters of the model. Below is a rendition of the basic model from Dudley's 1940 paper, "The Carrier Nature of Speech," published in the The Bell System Technical Journal.

Dudley's vocal tract model

At the 1939 World's Fair in New York, Bell Labs demonstrated this principle with a device called the "Voder," shown below in action.

Voder being demonstrated

The voder is operated by highly trained technicians (who at the time were called "girls"). A technician would manipulate a set of analog (continuous) controls that produced speech like sounds, as in the sentence "greetings everybody":

If you were able to run applets, there would be one here.

The voder is carefully designed to match the limitations of the human operator to the needs for modeling speech. It is shown in the following schematic:

Voder schematic
Ten "spectrum keys" control the gains of ten bandpass filters (because there are ten fingers). This crudely determines the spectral content of the speech signal (note that a normal human operator can only control at most ten keys at once). A wrist bar switches between a periodic excitation ("buzz-type energy") and a white-noise excitation ("hiss-type energy"). Periodic excitation is used to produce voiced sounds (like "aaaaa") while white-noise excitation is used to produce unvoiced sounds (like "sssss"). A foot pedal controls the frequency of the periodic excitation, which can thereby control inflection.

Listen to the complete Voder demonstration:

If you were able to run applets, there would be one here.