S100 Analog Speech Synthesizer

A simple inexpensive voice synthesizer for the S100 bus. 1977

Ever since I heard a speech synthesizer at Bell Labs when I was a teenager, I have been fascinated by the prospect of computer generated voice. Several racks of expensive equipment were needed to generate and control voice sounds, far beyond the resources of an amateur experimenter. Today, it is possible for anyone with a minimum investment to obtain the necessary equipment.

It seems as though all of the speech synthesizers an the market today are both expensive and secret. Large blocks of epoxy guard the inner workings of arcane circuits. As a dedicated do-it-yourselfer, buying blocks of unknown epoxy seemed a sin so I designed my own circuitry. Also as a card carrying tightwad I made certain that only simple, cheap and available components were used.

Before I could start designing I had to arm myself with some knowledge about voice generation so I went strait to the old masters at Bell Labs. They publish a book called "'The Speech Chain"' which covers the basic physics and biology of spoken language. For further enlightenment I consulted with "Speech, Analysis and Synthesis" by J. L. Flanigan (also of Bell Labs). Now well armed, I began my design.


Figure 1. Diagram of the vocal tract and Tube model.

The human vocal tract can be modeled by a series of tubes of varying cross section, acousticly driven by a set of vibrating bands called vocal cords. Such a tube exhibits a set of resonances called formants which can be seen by an audio spectrum analyser as peaks in the spectral output of the voice. As we speak we vary the position and crass sectlon af our acoustical tube with movements of the tongue, lips, cheek and scft palate. It is the resonances and their changeing paterns that provide much of the information our brains decode as speech.

Fig.2a. Vocal Cord Waveform

Fig.2b. Vocal Cord Spectrum.

Fig.2c Voice Waveform.

Fig.2d. Voice .Spectrum

Fig.2. Time and Frequency representation of speech waveforms.

The ORACLE 100 is termed a terminal analog type of synthesizer. That is it makes no attempt to model the acoustic tube and other measurements of the vocal tract, but simply tries to duplicate the waveforms that can be seen on an oscilloscope connected to a microphone.

A simple svstem that can duplicate most vowels is shown in Fig 3.

Fig 3. Block diagram of vowel generator

The important characteristics of the vocal cords, amplitude and frequency, are variables that modulate the output of a pulse generator. The pulses are then fed into a series of filters which have variable peak frequencies. These filters reproduce the peats in frequency responce (formants) in the speech spectrum, It is generally agreed that the the first three formants are sufficient to represent most vowels.

Now all speech isn't [ah] [oh] [ee] so we must make provision for consonent sounds. First we attack the fricatives, so named for the frying sound (white noise) we perceive in [s] [sh] [f] and [th]. All of these sounds are made from air passing through a constriction in the Touth lips or tongue. Spectral analysys shows them to be white noise with some accentuation in frequecy responce. The [sh] sound has the lowest frequency followed by the ty, [th] and [f] sounds.

Fig.4 shows the fricative generation system.

Now for some fine points. The H and whisper are produced when a small amount of fricative noise gets in the vocal tract. This is simulated by in injecting noise into the vowel filters. The nasal sounds [m],1[n] and [ng] are produced when the soft palate is open and the mouth closed by lips or tongue. Most of the sound energy escapes through the nose and through the throat and cheeks. After studying the output of my voice with a spectrum analyser I determined that lowering the Q of the first two,formant filters would best appoximate these sounds. Next, [b],[d], and hard [g] are produced with both the mouth and nose closed. By drasticly lowering the resonant frequency of the first formant filter I reproduced these sounds.

In designing the ORACLE 100, I tried to reduce to a minimum the number of bits needed to control each function. This accomplishes two goals; first, to reduce the cost of the circuitry needed and second, to reduce the amount of memory needed to store a resonable vocabulary. The bits used were derived from studies made by researches at Bell Labs and elsewhere.

Fig. 5 Block diagram A ORACLE 100 showing bits needed

Actual circuits.

REF Oracle100 Schematic 1
Oracle100 Schematic 2
Oracle100 Schematic 3
Oracle100 Schematic 4

Realizing that not everybody is an electronic engineer, I will divulge my circuit diagram. (While researching this article I read some 10 year old entries in "Proceedings of the IEEE in Acoustics" that had the same circuit ideas I just dreamed up in 1977)

First, the ORACLE 100 is configured as an I/0 device with a single IO address on the I/O Port starved original S100 (only 256 ports). Jumpers allow any one port to be selected. If the the proper I/0 address is on lines A00 through A07 board select is activated. A control data byte is stored by anding SOUT, /WRITE, and board select.

The byte is stored in a pair of 74LS75 latches, then converted to CMOS (12Volt) levels by 7416's and 4.7K pullup resistors. The control data is then routed to four F4724 addressable latches by a strobe generated by a 74121. Bits 5,6,and 7 of the control byte determine to which address bits 1,2,3,and 4 are latched. Bits 1,2,3,and 4 are the 4 bit data nibble which control the operating parameters of the analog circuits of the synthesizer. Once latched each nibble remains stored untill changed or a reset occurs. This system creates a powerful changed value coding scheme in which speech parameters that do not change during a particular time interval are not coded, saving the user approximately 30% of the memory otherwise needed. See Fig. 6 for the ORACLE 100 coding scheme.

Address OOO is decoded as a mode control. Because of the nature of the vocal apparatus it is not necessary for every mode of operation to be available at the same time. For instance there is normally no nasal and sibilant combinations. Code 00000000 (00) is reserved for end of message (EOM) which tells the processor the word is finished.

Address 001 is decoded as a time delay parameter. For a minimum delay of 10 milliseconds bit 0 of the control byte is set. This creates a pause before further information is sent to the synthesizer. For delays of up to 150 ms a code with the 001 address can be sent.

The computer looks at the status of the delay by doing an input from the board address and watching DI7. When DI7 is set the computer should delay before dumping more data into the synthesizer.

Address 010 sets the fundamental frequency of the pulse generator (1/2 NE556) which is the source for all voiced sounds.

Addresses 011,100,and 101 set the formant frequencies. Each formant filter is basicly a high Q low pass filter. Each filter is made from 3 operational amplifiers connected in the state variable or bi-quadratic form. Two resistors are varied to change the center (or cutoff) frequency. This is done by using resistors in series with an analog switch (CD4066). The effective resistance of this circuit is changed by pulse width modulating the CD4066 at an ultrasonic rate. A triangle wave ascillator made from an LM339 comparator and one section of a CD4070 provides the modulating frequency of approximately 25.6 KHz. The formant nibbles control a set of four resistors weighted in a 1-2-4-8 fashion. the voltage produced at the junction of the resisters is compared with the triangle wave with LM339 comparators, with the resultant waveform controlling the CD4066's on each filter.

The triangle oscillator provides a clock which feeds a CD4026 counter and a CD4006 shift register. The shift register operates with a CD4070 exclusive-or chip to produce a pseudo-random sequence generator. (PRG) The output of the PRG constitutes the noise sourse for the fricative sounds. The output of the CD4020 is the 10 ms delay clock.

Address 110 is the amplitude parameter. A set of resistors in a 1K dip pak .2K sip pak are connected to farm a set of 3db voltage steps (a division by 1.414). A CD4051 analog multiplexer makes contact to the appropriate voltage for each amplitude step, This voltage is modulated by either the voice pulse or the noise sequence. The amplitude data also modulates the width of the voice pulse. Lower amplitude voice is associated with a wider glottal pulse.

Address 111 has only one function as yet, to set or reset the interupt mode.


To create understandable words, the data controlling the ORACLE 100 must be highly structured. Several types of software structures can be implemented. The most straight foward system is to have the data for each word in a separate list. fhe starting address for each word is found in a dictionary and a simple subroutine reads the code and passes parametersto the synthesizer. This is an example of the drive subroutine.



	MOV A,M		; Get Byte From Memory

	OUT SYNTH	;Output to Synthesizer
	ANI	ffH	;Check for EOM character
	RZ		;Return if EOM found
	IN SYNTH	;Get Time Status From Synthesizer ,
	ANI 80H		;Check if Ready for new Data
	JNZ CKST	;If nor ready keep checking
	INX HL		;increment pointer for next byte

The word list produces the best fidelity speech, but requires the most memory. About 30 to 150 bytes per word are needed depending on length and number of sylables. A full set of ASCII characters requires about 3K bytes. An alternate way of driving the synthesizer is to break words into components which are called phonemes. Phoneticists have selected 43 phonemes for the standard American English. Each phoneme is assigned an ASCII character to represents it. Combinations of phonemes are operated on by a "Synthesis by Rule" program which calculates the spectral tragectories of the formants. Such a program must be quite complicated in oder to produce decent output.


TypeData 10 ms delay
00 0Mode TD
00 1 Time Delay
0 1 0 Fund. Freq. TD
0 1 1 Formant 1 LD
1 0 0 Formant 2 TD
1 0 1 Formant 3 TD
1 1 0 Amplitude TD
1 1 1 Interupt TD

00 EOM 20 0 ms40 75 Hz60 250 Hz80 600 HzAO 1500HzC0 0db
02 Silent 21 10ms42 8062 30082 750A2 1625C2 3db
04 i 22 20ms44 8564 35084 900A4 1750C4 6db
06 Asp. 23 30ms46 9066 40086 1050A6 1875C6 9db
08 Normal 24 40ms48 9568 45088 1200A8 2000C8 12db
0A Nasal 25 50ms4A 1006A 5008A 1350AA 2125CA 15db
OC Voice Bar26 60ms4C 1056C 5508C 1500AC 2250CC 18db
OE -- 27 70ms4E 1106E 6008E 1650AE 2375CE 21db
10 SH 28 80ms50 11570 65090 1800BO 250,0
12 S 29 90ms52 12072 70092 1950B2 2625
14 F 2A 100ms54 12574 75094 2100B4 2750
16 TH 23 110ms56 13076 80096 2250B6 2875
18 J 2C 120ms58 14578 85098 2400B8 3000
1A Z 2D 130ms5A 1467A 9009A 2550BA 3125
1C V 2E 140ms50 1457C 9509C 2700BC 3250
1E TH 2F 150ms5E 1507E 10009E 2850BE 3375

Vowel codes
Beet Bid Bed Man Father Haw Hood Moon Hut Her L M N NG
60 66 6C 70 74 6C 68 62 70 6A 62 60 60 60
96 90 90 90 86 84 86 84 88 8A 86 86 8C 90
62 68 6E 78 78 6E 68 64 74 6A 64 62 62 62
9C 98 96 94 88 84 88 84 8A 8E 88 88 8E 92
BC B8 B8 B8 B8 B4 BO B4 B4 A8 AE BO BO B4
ee i e ae ah aw u 00 n er

Partial Parts list

Integrated Circuits
LM556 1 Dual Timer
MC3403 4quad op-amp
LM339 1quad camparator
LM7805 1+5 volt regulator
LM7812 1+12 volt regulator
CD4001 1CMOS quad nor
CD4011 1CMOS quad nand
CD4006 1CMOS 18 stage shift register
CD4013 1CMOS dual D latch
CD4020 1CMOS 13 stage binary counter
CD4029 1CMOS up/down loadable counter
CD4066 1CMOS quad analog switch
CD4724 2CMOS 8 bit adressable latch
CD4051 2CMOS 8 input analog multiplexer
CD4073 1CMOS triple 3 input AND
CD4081 1CMOS quad 2 input AND
74LS04 2TTL hex inverter
74LS16 2TTL hex inverting open collector buffer
74LS30 1TTL 8 input NAND
74LS175 2TTL quad latch
74LS25 1TTL dual 4 input nor
74LS121 1TTL monostable
74LS125 1TTL quad tristate buffer

Other semiconductors
1N914 8GP signal Diode
2N39043 NPN GP transistor