Best or fastest 8-bit audio compression scheme?

Discussion:

(too old to reply)

Jim Leonard

2004-04-09 15:14:36 UTC

Hi, I'm working on a project that requires compression of 8-bit audio PCM data
by at least 2:1, and I'm trying to gather information on various methods so
that I can implement the best one for this project. I'm having a hard time
coming up with information and was hoping I could get some advice.

The project I'm working on is getting an original IBM PC/XT (a 4.77MHz 8088
with 640K of RAM) displaying multimedia. The video portion I already have
worked out, but in order to deal with the incredibly slow hard drive I need to
compress the audio by at least 2:1 as well. The audio data to be compressed
is unsigned 8-bit PCM mono at either 22050 or 44100Hz.

For reference, I have already come up with a low-cycle routine; it's an
adaptive DPCM scheme where the differentials are represented by signed 4-bit
indexes pointing to a value in a table. I use varying frame sizes and lookup
tables per frame to minimize error; since the indexes are 4 bits and the
tables are only 16 bytes per frame, it achieves close to 2:1 compression on
8-bit data. I based the routine around using the 8088 opcode XLAT for speed;
XLAT replaces a source value with one from a table in memory in only six
cycles.

But I can't help thinking there is a better algorithm for either smaller
sizes, or better quality at the same size. Because of the low cpu decode
requirements, there aren't very many methods to choose from. Here's a list of
what I've researched; I would very much appreciate additional suggestions or
comments on the below:

- Apple ACE/MACE: I've tried to get information on Apple's ACE/MACE
algorithms (ACE achives 3:1 and MACE achieves 6:1 or 8:1) but all I found is
decompression code that does not hint at the compression method.

- IMA ADPCM: This only outputs 16-bit signed data, so I'd have to shift the
output down to 8-bit ranges and invert the sign. I don't think that would
hurt the quality *too* much, but I haven't implemented it yet to find out.
Also, decompression speed on an 8088 is unknown. (And on a similar note,
are there any 8-bit implementations of IMA ADPCM?)

- 1-bit "halftoning" system as referenced in Patent #5,095,509: I can't grok
the legalese in the patent sufficiently to understand the process; it
appears to work by oversampling to 8x the sampling rate and dithering to
1-bit, like halftoning in printing, but I can't quite wrap my head around
this concept... any explanations out there?

- Creative's own 4/3/2-bit ADPCM: Sound Blaster cards can play back 4/3/2-bit
compressed audio in hardware, but other than their own VOC Editor, I can't
find any software that can create these files, nor anything that explains
the algorithm involved.

- John Ratcliff's ACOMP system: In an old Dr. Dobb's article, John Ratcliff
outlined a system similar to my own, except that he uses multiple bit depth
indexes and variable multipliers applied to the deltas referred to by the
indexes. He claims it achieves between 1.5:1 and 3:1 for music data, but I
haven't implemented it yet to verify the quality at those ratios. Also, I
think a mistake he made in his implementation is using linear tables, when
logrithmic tables would have been a better choice. Has anyone implemented
the ACOMP scheme for themselves?

Any advice as to the best one? Are there additional schemes with simple
decompression that I may be overlooking?

Logan Shaw

2004-04-09 17:34:21 UTC

Permalink

Post by Jim Leonard
The project I'm working on is getting an original IBM PC/XT (a 4.77MHz 8088
with 640K of RAM) displaying multimedia. The video portion I already have
worked out, but in order to deal with the incredibly slow hard drive I need to
compress the audio by at least 2:1 as well. The audio data to be compressed
is unsigned 8-bit PCM mono at either 22050 or 44100Hz.

If you are doing 8-bit samples, it's going to sound pretty cruddy
no matter what you do. Therefore, have you considered just
dropping the sample rate to 11025Hz? You will only lose 1 octave
off the upper end of the response by doing so, and your data will
be half as large.

- Logan

Jim Leonard

2004-04-11 01:43:15 UTC

Permalink

Post by Logan Shaw
If you are doing 8-bit samples, it's going to sound pretty cruddy
no matter what you do. Therefore, have you considered just
dropping the sample rate to 11025Hz? You will only lose 1 octave
off the upper end of the response by doing so, and your data will
be half as large.

It sounds like crap when I do that :-). I would rather have
distortion on the high end than no high end at all. If this were a
simple education game or something I probably would, but the project
is a proof-of-concept and needs to have maximum impact -- which is
probably going to be 44.1KHz.

As for 8-bit samples sounding terrible, yes I agree that the maximum
you could ever hope for is 48dB s/n, but the human brain is a very odd
machine and it turns out that, when interpreting audio, sampling
frequency is usually more important than amplitude. For example, I
would much rather hear a 16KHz 1-bit sample of someone talking than a
1KHz 16-bit sample. The 1KHz 16-bit sample is completely
unintelligible, while the 16KHz 1-bit sample, while harsh, is
understandable. So that is behind the reasoning for 44.1 for this
project.

I wouldn't be bothering with audio compression at all, given the slow
nature of the playback host, but the hard drive on the host tops out
at a whopping 128KB/s (not kidding, this is an old 10MB MFM drive) and
I can only afford 22050B/s for audio. Yes, it can be 22KHz audio, but
if I can get "free" 44.1KHz audio out of the deal, then hey, why not?
:-)

David Kopf

2004-04-11 18:52:27 UTC

Permalink

Post by Jim Leonard

Check out the audio compression method from http://dakx.com. It's basically
delta coding with a block size of 1. Sounds like you've already done
efficient assembly code for bit shifting, but you might get some more ideas
from the 680x0 and PPC examples there. My decoder implementations required
about 2 MIPS for 44KHz stereo, often with lossless 2:1 compression for 8
bit music. If necessary you can drop to 7 or 6 bits to keep below a maximum
bit rate.

I think you CAN get greater than 48 dB S/N from 8 bit audio by going through
fourier space and reconstructing with more precision. If your music
passages are only a few seconds long, and you have a fast enough FFT,
compressing the entire chunk in fourier space gives pretty good music at
around 10 kbits/sec. Unfortunately stringing the chunks together for
continuous music gives terrible phase clicks, and smoothing out the phase
noise consumes more and more CPU cycles until you ultimately end up with
something like mp3.

Forget about Apple's 3:1 or 6:1 MACE, IMO it is terrible for music and
barely adequate for speech.

Jim Leonard

2004-04-12 07:30:13 UTC

Permalink

Post by David Kopf
My decoder implementations required
about 2 MIPS for 44KHz stereo, often with lossless 2:1 compression for 8
bit music. If necessary you can drop to 7 or 6 bits to keep below a maximum
bit rate.

I'll definitely check it out, thanks. But I noticed the patent -- do
I have permission to implement your method?

Unfortunately, 2MIPs is more CPU time than I can spare (4.77MHz 8088
is about 0.3 MIPs). But I will still check it out, as I'm curious.

Post by David Kopf
I think you CAN get greater than 48 dB S/N from 8 bit audio by going through
fourier space and reconstructing with more precision.

Yes, but then that wouldn't technically be 8-bit sound any more would
it? ;-)

Post by David Kopf
If your music
passages are only a few seconds long, and you have a fast enough FFT,

There's no such thing as a FFT on an 8088, I don't think they have the
speed. Also, this project is essentially streaming video+audio, so
audible boundaries between frames aren't acceptable.

Post by David Kopf
Forget about Apple's 3:1 or 6:1 MACE, IMO it is terrible for music and
barely adequate for speech.

Thanks for confirming my suspicions (I was never able to obtain any
samples).

Phil Frisbie, Jr.

2004-04-09 17:46:43 UTC

Permalink

Post by Jim Leonard
Hi, I'm working on a project that requires compression of 8-bit audio PCM data
by at least 2:1, and I'm trying to gather information on various methods so
that I can implement the best one for this project. I'm having a hard time
coming up with information and was hoping I could get some advice.
The project I'm working on is getting an original IBM PC/XT (a 4.77MHz 8088
with 640K of RAM) displaying multimedia. The video portion I already have
worked out, but in order to deal with the incredibly slow hard drive I need to
compress the audio by at least 2:1 as well. The audio data to be compressed
is unsigned 8-bit PCM mono at either 22050 or 44100Hz.

At that high a sample rate CVSD might work fine and will reduce the bit rate to
1/4 or 1/8 depending on the algorithm you use.

--
Phil Frisbie, Jr.
Hawk Software
http://www.hawksoft.com

Jim Leonard

2004-04-11 02:14:50 UTC

Permalink

Post by Phil Frisbie, Jr.
At that high a sample rate CVSD might work fine and will reduce the bit rate to
1/4 or 1/8 depending on the algorithm you use.

Is CVSD the algorithm where each bit denotes a rise or fall in the
waveform? Are there any (complex) predictors involved? Finally, is
it useful for non-voice data? (The data being compressed is all
music)

Any good/favorite resources that describe how to implement CVSD? I've
searched google for the last 15 minutes and come up with several
mentions of CVSD but no actual algorithms or implementations (that
make sense)...

Phil Frisbie, Jr.

2004-04-12 17:07:03 UTC

Permalink

Post by Jim Leonard

Post by Phil Frisbie, Jr.
At that high a sample rate CVSD might work fine and will reduce the bit rate to
1/4 or 1/8 depending on the algorithm you use.

Is CVSD the algorithm where each bit denotes a rise or fall in the
waveform?

That is one common version.

Post by Jim Leonard
Are there any (complex) predictors involved?

No.

Post by Jim Leonard
Finally, is
it useful for non-voice data? (The data being compressed is all
music)

It is a waveform encoder, not a voice encoder, so it does not matter what the
source is.

Post by Jim Leonard
Any good/favorite resources that describe how to implement CVSD? I've
searched google for the last 15 minutes and come up with several
mentions of CVSD but no actual algorithms or implementations (that
make sense)...

Yes, I remember how hard it was to find good resources a couple of years ago
when I was researching CVSD.

Here is a tutorial that is interesting:
http://www.stecint.co.kr/CML/products/applications/cvsd_1.pdf

And here is some C code that simulates a hardware CVSD encoder and decoder:
http://cvs.mess.org:6502/cgi-bin/viewcvs.cgi/src/sound/hc55516.c?rev=1.2

--
Phil Frisbie, Jr.
Hawk Software
http://www.hawksoft.com

Jim Leonard

2004-04-13 05:20:33 UTC

Permalink

Post by Phil Frisbie, Jr.
Yes, I remember how hard it was to find good resources a couple of years ago
when I was researching CVSD.

Mainly because everything that comes up is for CVS! :-)

Post by Phil Frisbie, Jr.
http://www.stecint.co.kr/CML/products/applications/cvsd_1.pdf
http://cvs.mess.org:6502/cgi-bin/viewcvs.cgi/src/sound/hc55516.c?rev=1.2

Thank you, that's very helpful!

Errol Smith

2004-04-12 13:40:37 UTC

Permalink

...

Post by Jim Leonard
Any advice as to the best one? Are there additional schemes with simple
decompression that I may be overlooking?

Some thoughts - if you want something absolutely trivial, you could
use a non-linear mapping of 8 to 4bits - like u-law/a-law but with a
lower samplesize. Decompression is a simple table look-up, and it
would sound a lot better than straight 4 bit audio. You could use
error-diffusion (dithering) in the encoder to improve something like
this with no cost to the decoder.
Also you don't mention interpolating - playing a 22khz wav at 44khz
with something as simple as linear interpolation would improve the
"apparent" quality. (linear interpolation is just averaging
consecutive samples to get the one inbetween, which is just add &
shift). Interpolation helps to remove the "metallic" sound of lower
sample rates.
But ADPCM is fairly trivial, you should be able to re-implement one
of the various ADPCM's to 8 bit without too much trouble. Most ADPCM's
are targeted at 16bit audio to 4 bit compressed, you should be able to
get 8 to 3 bits with pretty much equal results (at least as far as the
top 8 bits goes).
BTW, try www.wotsit.org for lots of file formats/compression methods.

Errol Smith
errol <at> ros (dot) com [period] au

Jim Leonard

2004-04-13 05:38:35 UTC

Permalink

Post by Errol Smith
Some thoughts - if you want something absolutely trivial, you could
use a non-linear mapping of 8 to 4bits - like u-law/a-law but with a
lower samplesize. Decompression is a simple table look-up, and it
would sound a lot better than straight 4 bit audio. You could use
error-diffusion (dithering) in the encoder to improve something like
this with no cost to the decoder.

I tried companding using u-law and scaling the results down to fit
4-bit samples but it completely mangled music (speech was so-so). In
fact, after that first test, I came to the conclusion that companding
was only useful for higher-resolution targets (like what u-law really
is, 8-bit target holding 14-bit data).

Post by Errol Smith
Also you don't mention interpolating - playing a 22khz wav at 44khz
with something as simple as linear interpolation would improve the
"apparent" quality. (linear interpolation is just averaging
consecutive samples to get the one inbetween, which is just add &
shift). Interpolation helps to remove the "metallic" sound of lower
sample rates.

This is true; I may implement it in the decoder for some "free"
massaging of the audio. However, a compression scheme it is not ;-)

Post by Errol Smith
But ADPCM is fairly trivial, you should be able to re-implement one
of the various ADPCM's to 8 bit without too much trouble. Most ADPCM's
are targeted at 16bit audio to 4 bit compressed, you should be able to
get 8 to 3 bits with pretty much equal results (at least as far as the
top 8 bits goes).

Funny you should write that; not 30 minutes ago someone sent me an ACE
3:1 example, but the quality was only good for voice, and
unfortunately this project is all music.

Luckily I found ADPCM encoder/decode source adapted from Intel/DVI
ADPCM (the method is identical to regular 16-bit ADPCM except the
delta tables are adjusted. For future reference, the code was here:
http://members.aol.com/MJMahon/adpcm.zip
I can post the code as text (about 200 lines of 6502 assembly) if
anyone thinks that is a better method of reference...

Post by Errol Smith
BTW, try www.wotsit.org for lots of file formats/compression methods.

Thanks again, and thanks to everyone who offered suggestions. I think
I have enough to code implementations of the following for testing:

- 8-bit ADPCM (4-bit indexes, 2:1 compression)
- CVSD (1-bit deltas, 8:1 compression, probably voice only)
- My own scheme of variable frame sizes and tables (1.77-1.93:1)

All of these schemes should be implementable in less than 50 lines of
8088 assembly, which fit my needs. I have a sneaking suspicion that
the 8-bit ADPCM implementation will produce the best results, but I'll
have fun implementing them all to see. :)

Errol Smith

2004-04-13 12:00:21 UTC

Permalink

Post by Jim Leonard
I tried companding using u-law and scaling the results down to fit
4-bit samples but it completely mangled music (speech was so-so). In
fact, after that first test, I came to the conclusion that companding
was only useful for higher-resolution targets (like what u-law really
is, 8-bit target holding 14-bit data).

I still have some C64 stuff using 4 bit linear samples and you could
make out the music OK :) (it was critical to low pass filter any input
to nyquist, if you wanted "decent" sound).

Post by Jim Leonard

Post by Errol Smith
shift). Interpolation helps to remove the "metallic" sound of lower
sample rates.

This is true; I may implement it in the decoder for some "free"
massaging of the audio. However, a compression scheme it is not ;-)

Obviously :) And (obviously) it would only be useful if your source
was at a lower sample rate than you can playback at.

Post by Jim Leonard

Post by Errol Smith
But ADPCM is fairly trivial, you should be able to re-implement one
of the various ADPCM's to 8 bit without too much trouble. Most ADPCM's

Luckily I found ADPCM encoder/decode source adapted from Intel/DVI
ADPCM (the method is identical to regular 16-bit ADPCM except the
http://members.aol.com/MJMahon/adpcm.zip
I can post the code as text (about 200 lines of 6502 assembly) if
anyone thinks that is a better method of reference...

Speaking of c64, I didn't have much trouble with that. I would have
been fussier and not wasted 1/16 of my possible deltas, but that's
just me :)

Post by Jim Leonard
Thanks again, and thanks to everyone who offered suggestions. I think
- 8-bit ADPCM (4-bit indexes, 2:1 compression)
- CVSD (1-bit deltas, 8:1 compression, probably voice only)
- My own scheme of variable frame sizes and tables (1.77-1.93:1)
All of these schemes should be implementable in less than 50 lines of
8088 assembly, which fit my needs. I have a sneaking suspicion that
the 8-bit ADPCM implementation will produce the best results, but I'll
have fun implementing them all to see. :)

I would tend to agree that ADPCM would probably be the best
combination of size/complexity.
I don't know a lot about CVSD, but I get the impression that the
sample rate of CVSD (clock rate) is not always the same as your audio
sample rate, in fact it -should- be quite higher, which means you
effectivly have several bits per original sample.
"The clock frequency used should be minimally 9600 Hz and
ideally 64 kHz for voice applications designed for a typical
analog input frequency of 1000 Hz."
(in a PDF linked from
http://www.gamearchive.com/General/Data_Sheets/cvsd_speech_info/)

Considering you are working in the 8 bit domain, you might also want
to consider compression (in the _audio_ sense) because of the limited
dynamic range of 8 bit audio vs the original 16bit. You may find you
get a lot more oomph in the music if it is compressed rather than just
using the original 16bit samples with the bottom 8 bits cut off. (any
decent audio editor should have a compression function - eg the
free/open source "Audacity").

Errol Smith
errol <at> ros (dot) com [period] au

Jim Leonard

2004-04-14 16:03:27 UTC

Permalink

Post by Errol Smith
I still have some C64 stuff using 4 bit linear samples and you could
make out the music OK :) (it was critical to low pass filter any input
to nyquist, if you wanted "decent" sound).

Digitized sound on a C64 has always fascinated me. What was the
maximum output rate and bit depth? Was it a standard-at-the-time
method of pulsing the speaker faster than it could respond (IBM PC,
Apple) or did it use the SID creatively? (I am assuming SID in some
way, because if memory serves the C64 game Turbo Outrun had synth
music and digitized instrument hits at the same time during the title
screen tune.)

Post by Errol Smith
I don't know a lot about CVSD, but I get the impression that the
sample rate of CVSD (clock rate) is not always the same as your audio
sample rate, in fact it -should- be quite higher, which means you
effectivly have several bits per original sample.

The simplest form of CVSD -- and probably the simplest form of DPCM
that I have ever heard of -- is that each bit represents a rise or
fall of the waveform:

0 means (next sample = previous sample - 1)
1 means (next sample = previous sample + 1)

In reality this leads to triangular waves and drift, so if I
understand it correctly, you monitor the previous two or three samples
and increase or decrease the modifier (the -1 or +1 mentioned above)
if they are the same value. With each bit representing a rise or
fall, 44100Hz 8-bit speech could be represented as (44100/8) or ~5513
bytes per second. I can't believe this will work well with music, but
for speech I can see how it would work just fine, so I will probably
implement it as an alternate codec for speech only.

Post by Errol Smith
Considering you are working in the 8 bit domain, you might also want
to consider compression (in the _audio_ sense) because of the limited
dynamic range of 8 bit audio vs the original 16bit. You may find you
get a lot more oomph in the music if it is compressed rather than just
using the original 16bit samples with the bottom 8 bits cut off.

That is a very good suggestion, especially because I think I'm going
to implement regular 16-bit ADPCM and just convert to 8-bit realtime
via shifting. (Eventually this scheme is going to be upgraded, so I
might as well implement regular 16-bit ADPCM now, assuming the shiting
to 8-bit isn't going to mangle the sound terribly.)

Trivia for those who remember Access' RealSound process for digitized
sound on the C64 and IBM PC platforms of the 1980s: I learned that
compression in the *audio* sense was a required part of the patented
process during an interview with the patent holder in 1997. He said
that when they first walked into the studio to do recordings and sound
effects for the first game (I think it was Echelon), he held up a PC
speaker ripped out of an IBM PC and said to the audio engineer, "This
is the output device we're mastering for." So there was definitely
major massaging in the dynamic range area :-)

Willem

2004-04-14 20:41:29 UTC

Permalink

Jim wrote:
) Digitized sound on a C64 has always fascinated me. What was the
) maximum output rate and bit depth? Was it a standard-at-the-time
) method of pulsing the speaker faster than it could respond (IBM PC,
) Apple) or did it use the SID creatively? (I am assuming SID in some
) way, because if memory serves the C64 game Turbo Outrun had synth
) music and digitized instrument hits at the same time during the title
) screen tune.)

There are (at least) two methods to do digital music on a C64.

The first method is to setup a pulse waveform of the highest frequency,
with a low-pass filter, and wripe sample values to the pulse width
register, which was 12 bits wide, effectively giving you 12-bit digital
sound, but that wasn't used much.

The second, and much wider-used, method is to write sample values to the
volume register, which works directly. I think it's because the waves in
a SID were unsigned-based, so changing the volume didn't only scale the
waves, it also moved them up or down (so to speak).

SaSW, Willem

--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT