site version is
page last edited on 20010408
page was never integrally checked for link correctness
have this site translated
other mirror sites

Sound chips

Sampled sound replay
Melody generation
Voice generation

Sampled sound recording
Voice recognition

This is technologically seen the easiest way to generate sound. You just record is using an analog digital convertor and replay it later via a digital analog convertor. This method uses a lot of memory space and or bandwidth however and it's not possible to manipulate the data in a logical way. To achieve (mono) ISDN telephone quality you'll need 8bit*8K/s = 64 Kbit/s. To achieve (stereo) CD quality you'll need 16bit*2*44.1K/s = ~1.4 Mbit/s.

There are several ways to compact sound. Most people can't hear frequencies over 10 Khz and voices are still very recognizable when you remove the high and low frequency. Analog telephone systems only passed the frequencies from 200 Hz to 3400 Hz. To transmit a certain frequency you'll have to sample at twice it's speed, so ISDN at 8K samples per seconds will transmit frequencies up to 4 KHz. The signals that used to be transmitted over ISDN lines in the early days were only human voice data and low speed modem signals upto say 1200 bit/s. These signals have as a characteristic that they never transmit two different frequencies at the same time and the noise may be removed. Based upon this knowledge (Adaptive Differential Pulse Code Modulation (ADPCM*) was developed which could compress ISDN with a factor 2, so to 4bit*8K/s=32Kbit/s. However this may not compress DTMF, music or faster modem signals well. My experience is that some kinds of music can be compressed very well and others not, for example those with important drum (=noise) parts.
By the way, the standard method of ISDN already uses a form of logarithmic compression called PCM*. The whole 16 bits sound spectrum is interpreted logarithmically into an 8 bits range whereby the sounds with a lower volume have relatively more steps. This is in accordance with how the human ear interprets sound.
Depending on the signal kind it's of course also possible to reduce the sample rate or the sample size. Almost all current sound cards in PC's together with Windows can sample and replay via all the above mentioned methods, so you can experiment on your PC.

A method to save on ADC and DAC hardware which was used in earlier cheap sound hardware was called Frequency Modulation (FM*) but it has little to do with the FM* you know from radio broadcasting. It consisted of a pure bitstream at a certain rate and at an 0 the signal should go down and at an 1 the signal should go up. An output for such a signal can be made with a logical output port (0 and 5V), a resistor and a capacitor. The voltage over the capacitor will be equavalent to the original sound volume, so you can send it directly to the amplifier. (Probably a little filtering is required.)

In some cases it's possible to reduce the bit stream by filtering out quiet moments. Although in a telephone conversation pauzes are usually not present people tend to speak in turns, so halve of each signal could be filtered out. With modern packet based telephony methods is of course better to just compress the back channel much more when it's almost silent. There may be background sounds that have to be transmitted for example.

A modern ways of compressing sound are those used by MP3 and the methods that streaming media like RealPlayer from Real Networks and Mplayer from Microsoft.

When you're building small cheap systems that only has to generated prerecorded sound, please consider that it may be economical to spend a lot of money, time, sophisticated equipment and software to record and compress the sounds (and even adapt them to strange hardware) so you can make the replay systems as cheap as possible. For example if you use a 4 bit resistor network as a cheap DAC, it may be possible to recalculate the samples to avoid the non-linearity of that cheap DAC.

Melody generation

Melody generators have the advantage of using very little memory space and/or bandwidth, because they store/transmit the music data in a very (sheet music like) abstract data format. This means that to play the 'C'-key only a token representing this note needs to be send. This may take only 4 bits if you only need one octave with half-notes or two without half-notes. Usually you'll also want to denote the duration of a note, although you can also denote double duration notes by playing the same note twice, or putting an escape token before notes with an exceptional duration. On the other hand this system can also be made as complex as the MIDI standard in which you can code almost everything that can be played in music nowadays.
However as far as I know it's hard to code voice data this way and also music which doesn't use the western 12-key per octave system or notes that are a bit off-tune.
The generated tones can be played with devices as simple as FM* or blockwave generators or can be as sophisticated as wavetables.

By the way, there is a fundamental difference between synthesizers (as integrated in every modern PC) and electrical piano's: In electrical piano's every tone is generated anyway (using a couple of frequency divider chips) and which key the player presses determines which tones are send to the amplifier. In case of a synthesizer or PC, the tone is generated after the key has been pressed. Early synthesizers were limited to only one different tone at a time, but after a while multichannel synthesizers appeared. The number of channels determined how many noted could be played at the same time. In case of a piano that is played by one person with 10 fingers and always playing one key with one finger only a maximum of 10 channels are needed. In early synthesizer days these channels had to be made in hardware and therefore were expensive, but nowadays it's probably done in software, so it's cheap and you can synthesize complete symphonies with a single machine.

The synthesizer chips of PC soundcards are usually designed by Yamaha who is also one of the main synthesizer makers.

Sound effect generation

Sound effects differ from melodies that they don't use the 12 notes per octave system, but may use gradually alternating frequencies (as a siren does) or that they use noise (as an explosion does). Suppose that the tone is generated by a timer/counter which counts from n to 0 and then switches the polarity of the signal and start counting from n again etc. An extra timer/counter would be needed to increase or decrease n at certain intervals. Often there is also the possibility to lay an envelope over the final sound waveform and to program the attack and decay time.

Famous sound/effects generators are the AY-3-8910 and AY-3-8912 which were for example often used in pinball machines and video arcade games in the 1980's.

Voice generation

Or speech/voice synthesis/genration.

I don't know much about pure voice synthesis. I have heard about chips trying to emulate the complete vocal system of humans. Most of what we think is voice synthesis is of course just voice replay on the level of single words or letters (called phonemes or voxels I think).
Already around 1982 Radio Shack sold an optional box for it's TRS-80, with the famous SP0256-AL2 voice synthesizer chip.

voicegen.htm Lots more information.

Sound input

Sampled sound recording

See the section about 'Sampled sound replay'.

Voice recognition

Voice recognition has been a promise since the early 1980's. It seems that for limited fields of interpretation (like docters lingo or hotel reservations) the systems work quite well. They usually run on computers and not on single chips. Lernout and Hauspie and Philips seem to be the major players on this market.

voicercg.htm

Lots more information.

Companies

Honsitak Electronics Co Ltd
5F, No. 494, Sec. 3
Chung Shan Road
Shinchung City
Taipei
Taiwan
tel: +886-2-29010513
fax: +886-2-29089494
honsitak@ms11.hinet.net

www.asiansources.com/honsitak.co

Consumer ICs: We have UMC, REALTEK, PTC, HMC, MOSEL, HOLTEK, WINBOND, API and other brands available.

We are a specialized manufacturer and an exporter of a wide range of attractive novelties and giftware.
Our factory in China manufacturers various kinds of high-quality musical, voice and noise electronic modules.
We are also a source for musical cards, melody badges and talking greeting cards.

Melody IC* series:

Single-tone melody ICs
Dual-tone melody ICs
12 tune, single/dual tone melody ICs
Dual* tone melody + speech ICs

Sound effect ICs:

Racing car/motorcycle sound ICs
Three/six-siren sound ICs
Engine & horn sound ICs
Ding-dong doorbell ICs
Eight gun-sound ICs
Mobile phone sound ICs

Voice/Speech ICs:

3-second, 6-second, 8-second...60-second voice ICs

index.htm	Index page for sound chips
melody.htm	About melody generating chips
speechge.htm	About speech/voice generation/synthesis
speechrc.htm	About speech recording
voice.htm	About voice/speech generation/synthesis
../../oth/voicerec.txt	FAQ about voice recognition processors