A vocoder ( / ˈ v oʊ k oʊ d ər / , a portmanteau of vo ice and en coder ) is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression , multiplexing , voice encryption or voice transformation.
62-402: The vocoder was invented in 1938 by Homer Dudley at Bell Labs as a means of synthesizing human speech. This work was developed into the channel vocoder which was used as a voice codec for telecommunications for speech coding to conserve bandwidth in transmission. By encrypting the control signals, voice transmission can be secured against interception. Its primary use in this fashion
124-505: A foot pedal for pitch control of tone. The filters controlled by keys convert the tone and the hiss into vowels , consonants , and inflections . This was a complex machine to operate, but a skilled operator could produce recognizable speech. Dudley's vocoder was used in the SIGSALY system, which was built by Bell Labs engineers in 1943. SIGSALY was used for encrypted voice communications during World War II . The KO-6 voice coder
186-449: A calibrated generator, care must be taken so that the shot noise dominates the thermal noise of the tube's plate resistance and other circuit elements. Long, thin, hot-cathode gas-discharge glass tubes fitted with a normal bayonet light bulb mount for the filament and an anode top cap , were used for SHF frequencies and diagonal insertion into a waveguide . They were filled with a pure inert gas such as neon because mixtures made
248-489: A carrier sound which is shaped into formants by the throat, mouth and sinuses into what we recognize as vowel sounds ("aah", "eeh", "ooh", etc.), which are further shaped by plosives (such as pressing the lips together to create a "p" sound) and glottal stops (such as closing the back of the throat to produce a "guh" sound). Dudley theorized that an intelligible analogue to human speech could be created by breaking sound down into modular blocks which could be assembled into
310-403: A channel vocoder in a secure speech system. Later work in this field has since used digital speech coding . The most widely used speech coding technique is linear predictive coding (LPC). Another speech coding technique, adaptive differential pulse-code modulation (ADPCM), was developed by P. Cummiskey, Nikil S. Jayant and James L. Flanagan at Bell Labs in 1973. Even with
372-549: A commercial context, with their 1977 album Out of the Blue . The band extensively uses it on the album, including on the hits " Sweet Talkin' Woman " and " Mr. Blue Sky ". On following albums, the band made sporadic use of it, notably on their hits " The Diary of Horace Wimp " and " Confusion " from their 1979 album Discovery , the tracks "Prologue", "Yours Truly, 2095", and "Epilogue" on their 1981 album Time , and " Calling America " from their 1986 album Balance of Power . In
434-400: A controlled way, creating the wide variety of sounds used in speech. There is another set of sounds, known as the unvoiced and plosive sounds, which are created or modified by the mouth in different fashions. The vocoder examines speech by measuring how its spectral characteristics change over time. This results in a series of signals representing these frequencies at any particular time as
496-738: A data rate in the range of 64 kbit/s, but a good vocoder can provide a reasonably good simulation of voice with as little as 5 kbit/s of data. Toll quality voice coders, such as ITU G.729 , are used in many telephone networks. G.729 in particular has a final data rate of 8 kbit/s with superb voice quality. G.723 achieves slightly worse quality at data rates of 5.3 and 6.4 kbit/s. Many voice vocoder systems use lower data rates, but below 5 kbit/s voice quality begins to drop rapidly. Several vocoder systems are used in NSA encryption systems : Modern vocoders that are used in communication equipment and in voice storage devices today are based on
558-461: A desired order, to allow the production and communication with artificial speech. By replacing the natural carrier sound of human speech with a carrier sound at a higher frequency, speech could be reproduced more clearly over long distances and low volumes, since higher frequency sounds are heard more clearly than lower ones. In 1928, Dudley began experimenting with electromechanical devices to produce analogues of human speech . A key to this process
620-412: A digital vocoder. Pink Floyd used a vocoder on three of their albums, first on their 1977 Animals for the songs " Sheep " and " Pigs (Three Different Ones) ", then in 1987 on A Momentary Lapse of Reason on " A New Machine Part 1 " and "A New Machine Part 2", and finally on 1994's The Division Bell , on " Keep Talking ". The Electric Light Orchestra was among the first to use the vocoder in
682-425: A mix of natural and processed human voices, and " Instant Crush " (2013) features Julian Casablancas singing into a vocoder. Ye (Kanye West) used a vocoder on the outro of his song " Runaway " (2010). Producer Zedd , American country singer Maren Morris and American musical duo Grey made a song titled " The Middle " which featured a vocoder and reached the top ten of the charts in 2018. Robot voices became
SECTION 10
#1732783319265744-514: A noise source, when operated as a diode (grid tied to cathode) in a transverse magnetic field. Another possibility is using the collector current in a transistor. Reverse-biased diodes in breakdown can also be used as shot noise sources. Voltage regulator diodes are common, but there are two different breakdown mechanisms, and they have different noise characteristics. The mechanisms are the Zener effect and avalanche breakdown . The Zener effect
806-464: A recurring element in popular music during the 20th century. Apart from vocoders, several other methods of producing variations on this effect include: the Sonovox , Talk box , Auto-Tune , linear prediction vocoders, speech synthesis , ring modulation and comb filter . Vocoders are used in television production , filmmaking and games, usually for robots or talking computers. The robot voices of
868-464: A ten-band device inspired by the vocoder designs of Homer Dudley . It was originally called a spectrum encoder-decoder and later referred to simply as a vocoder. The carrier signal came from a Moog modular synthesizer , and the modulator from a microphone input. The output of the ten-band vocoder was fairly intelligible but relied on specially articulated speech. In 1972, Isao Tomita 's first electronic music album Electric Samurai: Switched on Rock
930-664: A thesis in 1948 on electronic music and speech synthesis from the viewpoint of sound synthesis . Later he was instrumental in the founding of the Studio for Electronic Music of WDR in Cologne, in 1951. One of the first attempts to use a vocoder in creating music was the Siemens Synthesizer at the Siemens Studio for Electronic Music, developed between 1956 and 1959. In 1968, Robert Moog developed one of
992-433: A vocoder ("Pretty young thing/You make me sing"), courtesy of session musician Michael Boddicker . Coldplay have used a vocoder in some of their songs. For example, in " Major Minus " and " Hurts Like Heaven ", both from the album Mylo Xyloto (2011), Chris Martin 's vocals are mostly vocoder-processed. " Midnight ", from Ghost Stories (2014), also features Martin singing through a vocoder. The hidden track "X Marks
1054-480: A woman operator sitting behind the console, phrases resembling human speech could be demonstrated to the audience, although the produced sounds were often difficult to understand. On June 21, 1938 Dudley and Bell Labs were granted a patent (US#2,121,142) for a "System for the artificial production of vocal or other sounds". Dudley worked with Alan Turing on the SIGSALY project for the US Military. SIGSALY
1116-478: Is a circuit that produces electrical noise (i.e., a random signal). Noise generators are used to test signals for measuring noise figure , frequency response, and other parameters. Noise generators are also used for generating random numbers . There are several circuits used for noise generation. For example, temperature-controlled resistors, temperature-limited vacuum diodes, zener diodes, and gas discharge tubes. A source that can be switched on and off ("gated")
1178-521: Is a simple shot noise. For breakdown voltages greater than 7 volts, the semiconductor junction width is thicker and primary breakdown mechanism is an avalanche. The noise output is more complicated. There is excess noise (i.e., noise over and above the simple shot noise) because there is avalanche multiplication. For higher power output noise generators, amplification is needed. For broadband noise generators, that amplification can be difficult to achieve. One method uses avalanche multiplication within
1240-575: Is almost always used in tandem with other methods in high-compression voice coders. Waveform-interpolative (WI) vocoder was developed at AT&T Bell Laboratories around 1995 by W.B. Kleijn, and subsequently, a low- complexity version was developed by AT&T for the DoD secure vocoder competition. Notable enhancements to the WI coder were made at the University of California, Santa Barbara . AT&T holds
1302-399: Is beneficial for some test methods. Noise generators usually rely on a fundamental noise process such as thermal noise or shot noise . Thermal noise can be a fundamental standard. A resistor at a certain temperature has a thermal noise associated with it. A noise generator might have two resistors at different temperatures and switch between the two resistors. The resulting output power
SECTION 20
#17327833192651364-400: Is for secure radio communication. The advantage of this method of encryption is that none of the original signal is sent, only envelopes of the bandpass filters. The receiving unit needs to be set up in the same filter configuration to re-synthesize a version of the original signal spectrum. The vocoder has also been used extensively as an electronic musical instrument . The decoder portion of
1426-408: Is in contrast with vocoders realized using fixed-width filter banks, where the location of spectral peaks is constrained by the available fixed frequency bands. LP filtering also has disadvantages in that signals with a large number of constituent frequencies may exceed the number of frequencies that can be represented by the linear prediction filter. This restriction is the primary reason that LP coding
1488-543: Is low. (For a 1 kΩ resistor at room temperature and a 10 kHz bandwidth, the RMS noise voltage is 400 nV. ) If electrons flow across a barrier, then they have discrete arrival times. Those discrete arrivals exhibit shot noise . The output noise level of a shot noise generator is easily set by the DC bias current. Typically, the barrier in a diode is used. Different noise generator circuits use different methods of setting
1550-506: Is primarily exhibited by reverse-biased diodes and bipolar transistor base-emitter junctions that breakdown below about 7 volts. The breakdown is due to internal field emission, since the junctions are thin, and the electric field is high. Zener-type breakdown is shot noise . The flicker ( 1 f {\displaystyle \ {\tfrac {1}{\ f\ }}\ } ) noise corner can be below 10 Hz. The noise generated by Zener diodes
1612-687: The Cylons in Battlestar Galactica were created with an EMS Vocoder 2000. The 1980 version of the Doctor Who theme, as arranged and recorded by Peter Howell , has a section of the main melody generated by a Roland SVC-350 vocoder. A similar Roland VP-330 vocoder was used to create the voice of Soundwave , a character from the Transformers series. Homer Dudley Homer W. Dudley (14 November 1896– 18 September 1980)
1674-431: The amplitude component and simply ignoring the phase component tends to result in an unclear voice; on methods for rectifying this, see phase vocoder . The development of a vocoder was started in 1928 by Bell Labs engineer Homer Dudley , who was granted patents for it on March 21, 1939, and Nov 16, 1937. To demonstrate the speech synthesis ability of its decoder section, the voder (voice operating demonstrator)
1736-478: The new-age music genre often utilize vocoder in a more comprehensive manner in specific works, such as Jean-Michel Jarre (on Zoolook , 1984) and Mike Oldfield (on QE2 , 1980 and Five Miles Out , 1982). Vocoder module and use by Mike Oldfield can be clearly seen on his Live At Montreux 1981 DVD (Track " Sheba "). There are also some artists who have made vocoders an essential part of their music, overall or during an extended phase. Examples include
1798-525: The British band Emerson, Lake and Palmer used a vocoder on their album Brain Salad Surgery , for the song " Karn Evil 9: 3rd Impression ". The 1975 song " The Raven " from the album Tales of Mystery and Imagination by The Alan Parsons Project features Alan Parsons performing vocals through an EMI vocoder. According to the album's liner notes, "The Raven" was the first rock song to feature
1860-543: The DC bias current. One common noise source was a thermally-limited ( saturated-emission ) hot-cathode vacuum tube diode. These sources could serve as white noise generators from a few kilohertz through UHF and were available in normal radio tube glass envelopes. Flicker ( 1 f {\displaystyle \ {\tfrac {1}{\ f\ }}\ } ) noise limited application at lower frequencies; electron transit time limited application at higher frequencies. The basic design
1922-509: The German synthpop group Kraftwerk , the Japanese new wave group Polysics , Stevie Wonder (" Send One Your Love ", " A Seed's a Star ") and jazz/fusion keyboardist Herbie Hancock during his late 1970s period. In 1982 Neil Young used a Sennheiser Vocoder VSM201 on six of the nine tracks on Trans . The chorus and bridge of Michael Jackson 's " P.Y.T. (Pretty Young Thing) ". features
Vocoder - Misplaced Pages Continue
1984-591: The Spot" from A Head Full of Dreams was also recorded through a vocoder. Noisecore band Atari Teenage Riot have used vocoders in variety of their songs and live performances such as Live at the Brixton Academy (2002) alongside other digital audio technology both old and new. The Red Hot Chili Peppers song " By the Way " uses a vocoder effect on Anthony Kiedis ' vocals. Among the most consistent users of
2046-430: The assistance of fellow engineer Robert Riesz, Dudley created the " VODER " (for "Voice Operation DEmonstratoR"), a console from which an operator could create phrases of speech controlling a VOCODER with a keyboard and foot pedals; it was considered difficult to operate. The VODER was demonstrated at Bell Laboratory exhibits at both the 1939 New York World's Fair and the 1939 Golden Gate International Exposition . With
2108-429: The bandpass filter bank of its predecessor and is used at the encoder to whiten the signal (i.e., flatten the spectrum) and again at the decoder to re-apply the spectral shape of the target speech signal. One advantage of this type of filtering is that the location of the linear predictor's spectral peaks is entirely determined by the target signal, and can be as precise as allowed by the time period to be filtered. This
2170-451: The core patents related to WI and other institutes hold additional patents. For musical applications, a source of musical sounds is used as the carrier, instead of extracting the fundamental frequency. For instance, one could use the sound of a synthesizer as the input to the filter bank, a technique that became popular in the 1970s. Werner Meyer-Eppler , a German scientist with a special interest in electronic voice synthesis, published
2232-401: The corresponding carrier bands. The result is that frequency components of the modulating signal are mapped onto the carrier signal as discrete amplitude changes in each of the frequency bands. Often there is an unvoiced band or sibilance channel. This is for frequencies that are outside the analysis bands for typical speech but are still important in speech. Examples are words that start with
2294-468: The early 1960s. During that time, he invented and refined many of the technologies that became essential for telephony. His development of artificial speech was elaborated upon by others to produce methods of artificial speech for humans unable to use their vocal cords (as with the voice synthesizer used by Stephen Hawking ), and by electronic music pioneers Wendy Carlos , Robert Moog and the German musical group Kraftwerk . One of Dudley's final projects
2356-420: The elements of speech could be filtered into ten specific audio spectrum bands, rendering it more easily transmitted over telephone lines with greater clarity and legibility. The speech could also be compressed down to a very narrow frequency band, to allow multiple transmissions simultaneously on different bands. This enabled many telephone conversations to be transmitted at the same time over one line. With
2418-420: The first solid-state musical vocoders for the electronic music studio of the University at Buffalo . In 1968, Bruce Haack built a prototype vocoder, named Farad after Michael Faraday . It was first featured on "The Electronic Record For Children" released in 1969 and then on his rock album The Electric Lucifer released in 1970. In 1970, Wendy Carlos and Robert Moog built another musical vocoder,
2480-414: The following algorithms: Vocoders are also currently used in psychophysics , linguistics , computational neuroscience and cochlear implant research. Since the late 1970s, most non-musical vocoders have been implemented using linear prediction , whereby the target signal's spectral envelope (formant) is estimated by an all-pole IIR filter . In linear prediction coding, the all-pole filter replaces
2542-426: The frequency content based on the originally recorded series of numbers. Specifically, in the encoder, the input is passed through a multiband filter , then the output of each band is measured using an envelope follower , and the signals from the envelope followers are transmitted to the decoder. The decoder applies these as control signals to corresponding amplifiers of the output filter channels. Information about
Vocoder - Misplaced Pages Continue
2604-407: The instantaneous frequency of the original voice signal (as distinct from its spectral characteristic) is discarded; it was not important to preserve this for the vocoder's original use as an encryption aid. It is this dehumanizing aspect of the vocoding process that has made it useful in creating special voice effects in popular music and audio entertainment. Instead of a point-by-point recreation of
2666-558: The late 1970s, French duo Space Art used a vocoder during the recording of their second album, Trip in the Centre Head . Phil Collins used a vocoder to provide a vocal effect for his 1981 international hit single " In the Air Tonight ". Vocoders have appeared on pop recordings from time to time, most often simply as a special effect rather than a featured aspect of the work. However, many experimental electronic artists of
2728-439: The letters s , f , ch or any other sibilant sound. Using this band produces recognizable speech, although somewhat mechanical sounding. Vocoders often include a second system for generating unvoiced sounds, using a noise generator instead of the fundamental frequency . This is mixed with the carrier output to increase clarity. In the channel vocoder algorithm, among the two components of an analytic signal , considering only
2790-481: The need to record several frequencies, and additional unvoiced sounds, the compression of vocoder systems is impressive. Standard speech-recording systems capture frequencies from about 500 to 3,400 Hz, where most of the frequencies used in speech lie, typically using a sampling rate of 8 kHz (slightly greater than the Nyquist rate ). The sampling resolution is typically 8 or more bits per sample resolution, for
2852-424: The output temperature-dependent. Their burning voltage was under 200 V, but they needed optical priming (pre-ionizing) by a 2 Watt incandescent lamp prior to ignition by an anode voltage spike in the 5 kV range. For lower frequency noise bands glow lamps filled with neon have been used. The circuit was similar to the one for spike / needle pulses . One miniature thyratron found an additional use as
2914-645: The same barrier that generates the noise. In an avalanche, one carrier collides with other atoms and knocks free new carriers. The result is that for each carrier that starts across a barrier, several carriers synchronously arrive. The result is a wide-bandwidth high-power source. Conventional diodes can be used in breakdown. The avalanche breakdown also has multistate noise. The noise output power randomly switches among several output levels. Multistate noise looks somewhat like flicker ( 1 f {\displaystyle \ {\tfrac {1}{\ f\ }}\ } ) noise. The effect
2976-434: The signal into multiple tuned frequency bands or ranges. To reconstruct the signal, a carrier signal is sent through a series of these tuned bandpass filters . In the example of a typical robot voice the carrier is noise or a sawtooth waveform . There are usually between 8 and 20 bands. The amplitude of the modulator for each of the individual analysis bands generates a voltage that is used to control amplifiers for each of
3038-409: The user speaks. In simple terms, the signal is split into a number of frequency bands (the larger this number, the more accurate the analysis) and the level of signal present at each frequency band gives the instantaneous representation of the spectral energy content. To recreate speech, the vocoder simply reverses the process, processing a broadband noise source by passing it through a stage that filters
3100-460: The vocoder in emulating the human voice are Daft Punk , who have used this instrument from their first album Homework (1997) to their latest work Random Access Memories (2013) and consider the convergence of technological and human voice "the identity of their musical project". For instance, the lyrics of " Around the World " (1997) are integrally vocoder-processed, " Get Lucky " (2013) features
3162-410: The vocoder, called a voder , can be used independently for speech synthesis. The human voice consists of sounds generated by the opening and closing of the glottis by the vocal cords , which produces a periodic waveform with many harmonics . This basic sound is then filtered by the nose and throat (a complicated resonant piping system) to produce differences in harmonic content ( formants ) in
SECTION 50
#17327833192653224-435: The waveform, the vocoder process sends only the parameters of the vocal model over the communication link. Since the parameters change slowly compared to the original speech waveform, the bandwidth required to transmit speech can be reduced. This allows more speech channels to utilize a given communication channel , such as a radio channel or a submarine cable . Analog vocoders typically analyze an incoming signal by splitting
3286-401: Was a diode vacuum tube with a heated filament. The temperature of the cathode (filament) sets the anode (plate) current that determines the shot noise; see Richardson equation . The anode voltage is set large enough to collect all the electrons emitted by the filament . If the plate voltage were too low, then there would be space charge near the filament that would affect the noise output. For
3348-489: Was a method of transmitting speech in a secure manner, rendering it unable to be understood by unauthorized listeners. It utilized technology developed in the VOCODER and VODER projects, and added a random noise source as a method of encrypting speech. SIGSALY was successfully used by US military intelligence during World War II for transmitting the highest level of classified messages. Dudley stayed with Bell Labs through
3410-608: Was a preacher, and his parents also gave lessons to students, in classical and religious subjects. Dudley trained to be a grade school and high school teacher . He found it difficult to keep discipline in the classroom and soon gave up teaching. Intending a change in career, he enrolled in Pennsylvania State University , where he developed an interest in the nascent science of electronic engineering. After taking some college courses in electronic engineering, Dudley found employment with Bell Laboratories, which
3472-537: Was an American pioneering electronic and acoustic engineer who created the first electronic voice synthesizer for Bell Labs in the 1930s and led the development of a method of sending secure voice transmissions during World War Two. His awards include the Franklin Institute's Stuart Ballantine Medal (1965). Born in Virginia , Dudley's family moved to Pennsylvania when he was a schoolboy. His father
3534-439: Was an early attempt at applying speech synthesis technique through a vocoder to electronic rock . The album featured electronic renditions of contemporary rock and pop songs, while utilizing synthesized voices in place of human voices. In 1974, he utilized synthesized voices in his popular classical music album Snowflakes are Dancing , which became a worldwide success and helped to popularize electronic music. In 1973,
3596-502: Was at that time a division of Western Electric Company . His career with Bell Labs spanned 40 years, most of it in the Telephone Transmission Division. Dudley's primary area of exploration was in the idea of human speech being fundamentally the use of a carrier —a more or less continuous sound that is modulated and shaped by the mouth, throat and sinuses into recognizable speech. The vocal cords create
3658-475: Was introduced to the public at the AT&T building at the 1939–1940 New York World's Fair. The voder consisted of an electronic oscillator – a sound source of pitched tone – and noise generator for hiss , a 10-band resonator filters with variable-gain amplifiers as a vocal tract , and the manual controllers including a set of pressure-sensitive keys for filter control, and
3720-521: Was released in 1949 in limited quantities; it was a close approximation to the SIGSALY at 1,200 bit/s. In 1953, KY-9 THESEUS 1650 bit/s voice coder used solid-state logic to reduce the weight to 565 pounds (256 kg) from SIGSALY's 55 short tons (50,000 kg), and in 1961 the HY-2 voice coder, a 16-channel 2400 bit/s system, weighed 100 pounds (45 kg) and was the last implementation of
3782-500: Was the design of an electronic kit distributed by Bell Labs for home hobbyists and students, called "Speech Synthesis: an Experiment in Electronic Speech Production". The kit contained the components with which to create an electronic circuit that could produce three different speech formants. The kit entered production in 1963 and was produced until the late 1960s. Noise generator A noise generator
SECTION 60
#17327833192653844-482: Was the development of a parallel band-pass filter , which allowed sounds to be filtered down to a fairly specific portion of the audio spectrum by attenuating the sounds that fall above or below a certain band. This led to the patent for the " Vocoder " (a portmanteau of "voice" and "encoder"), a method of reproducing speech through electronic means, and allowing it to be transmitted over distances, as through telephone lines. By reproducing human speech electronically,
#264735