Discussion:
uncompressable files
(too old to reply)
chrome
2004-12-01 17:33:26 UTC
Permalink
There's a problem I've been struggling with for some time. Let's
assume you are trying to compress a 1KB file. Doesn't matter which
particular compression program you are using. For any given 1KB file
the compression program may either compress the file, fail to have
any effect whatsoever, or actually increase the file size. (For
instance, try zipping a zip-file). There are 2^8000 possible 1KB
combinations. Of these, only a certain number are compressable even
by the best compression programs. What I've been trying to figure out
is, what percentage of them are? What would a pie chart for all
possible 1KB files look like if you divided it up into compressable
and uncompressable files? Would the percentage be similar for other
file-sizes as well, such as 10KB files, 50MB files, etc? How would
one go about solving this problem mathematically and does anyone here
have any educated guesses on the percentages?

*-----------------------*
Posted at:
www.GroupSrv.com
*-----------------------*
Ingbert Grimpe
2004-12-01 19:21:11 UTC
Permalink
On 1 Dec 2004 11:33:26 -0600, chrome
<***@techie-dot-com.no-spam.invalid> wrote:

The basic problem is, that you will rarely have files with true random
values inside, i.e. your 2^8000 possible combinations simply never occur.
On a PC each byte stream follows a pattern, depending on the app that
created the stream. Streams that why generated by some kind of compression
algo (mp3, mpeg, zip et al) will hardly compress more. Streams generated
by other apps usually compress well (because they tend to be very regular).
If you really want to know how large the percentage of uncompressable
streams (not files, see above) is, you would have to do the research for
each compression algo.

IMHO, but I never was a mathemagican ;)

Ingbert
Phil Carmody
2004-12-02 02:15:50 UTC
Permalink
Post by chrome
There's a problem I've been struggling with for some time. Let's
assume you are trying to compress a 1KB file. Doesn't matter which
particular compression program you are using. For any given 1KB file
the compression program may either compress the file, fail to have
any effect whatsoever, or actually increase the file size. (For
instance, try zipping a zip-file). There are 2^8000 possible 1KB
combinations. Of these, only a certain number are compressable even
by the best compression programs. What I've been trying to figure out
is, what percentage of them are? What would a pie chart for all
possible 1KB files look like if you divided it up into compressable
and uncompressable files? Would the percentage be similar for other
file-sizes as well, such as 10KB files, 50MB files, etc? How would
one go about solving this problem mathematically and does anyone here
have any educated guesses on the percentages?
When measured in bytes, less than 1/256 of the possible files can
be compressed by one or more bytes. It's possible to attain almost
exactly this proportion,as long as you're prepared for all other
non-compressible files to balloon dramatically.

The proportion of real-world compressors that actually positively
compress is a pragamtic matter, and I'd guess that, as they have
limited worst-case behaviour, that the ratio is far short of 1/256.

The Kraft inequality explains why the ratios are as they are.

Phil
--
I used to have an interest in writing viral code and lost interest
quickly when Win95 came out. Hell how could any of us in the scene
write a more invasive program than Win95. It made us all obsolete.
-- Screaming Radish [NuKE] on alt.comp.virus.source.code
Matt Mahoney
2004-12-03 02:09:49 UTC
Permalink
Post by Phil Carmody
Post by chrome
There's a problem I've been struggling with for some time. Let's
assume you are trying to compress a 1KB file. Doesn't matter which
particular compression program you are using. For any given 1KB file
the compression program may either compress the file, fail to have
any effect whatsoever, or actually increase the file size. (For
instance, try zipping a zip-file). There are 2^8000 possible 1KB
combinations. Of these, only a certain number are compressable even
by the best compression programs. What I've been trying to figure out
is, what percentage of them are? What would a pie chart for all
possible 1KB files look like if you divided it up into compressable
and uncompressable files? Would the percentage be similar for other
file-sizes as well, such as 10KB files, 50MB files, etc? How would
one go about solving this problem mathematically and does anyone here
have any educated guesses on the percentages?
When measured in bytes, less than 1/256 of the possible files can
be compressed by one or more bytes. It's possible to attain almost
exactly this proportion,as long as you're prepared for all other
non-compressible files to balloon dramatically.
Actually you can get pretty close without ever expanding any files by more
than one byte. The following code will compress 255/65536 (about 1/257) of
all files:

If the first 2 bytes are equal and not 0, then remove the first byte.
Else prepend a 0 byte.

In general, the fraction of bit strings that can be compressed from n bits
to m bits cannot exceed 2^(m-n). I believe as you approach this limit that
the maximum size of your expanded files grows logarithmically.

-- Matt Mahoney
Phil Carmody
2004-12-03 18:29:17 UTC
Permalink
Post by Matt Mahoney
Actually you can get pretty close without ever expanding any files by more
than one byte. The following code will compress 255/65536 (about 1/257) of
If the first 2 bytes are equal and not 0, then remove the first byte.
Else prepend a 0 byte.
In general, the fraction of bit strings that can be compressed from n bits
to m bits cannot exceed 2^(m-n). I believe as you approach this limit that
the maximum size of your expanded files grows logarithmically.
Good call. Thanks Matt.

Phil
--
I used to have an interest in writing viral code and lost interest
quickly when Win95 came out. Hell how could any of us in the scene
write a more invasive program than Win95. It made us all obsolete.
-- Screaming Radish [NuKE] on alt.comp.virus.source.code
Simon Jackson, BEng.
2004-12-02 03:01:31 UTC
Permalink
as you only ever have 00 01 10 11 as possible follow on states then
only two possible mappings from the generator number are possible. if
you have the modulated generator and you extract the Xor content from
the mutual state, then you map back to the previous generator number
ad infinitum decompressing.

a one to two generator state mapping will exist always, but only with
the right modulation scheme will you be able to recover the mutual
information between the bits.

OK?
James K.
2004-12-04 04:02:47 UTC
Permalink
Post by chrome
For any given 1KB file
the compression program may either compress the file, fail to have
any effect whatsoever, or actually increase the file size. (For
instance, try zipping a zip-file).
*-----------------------*
www.GroupSrv.com
*-----------------------*
If and only if the probability of each symbol in a file is all the
same, and the conditional probability of the sym bole is equal to the
probability of the symbol (without the condition of earlier symbols),
the file will be incompressible.
BR,
------
James K. (***@hotmail.com)
[Home] http://home.naver.com/txdiversity

Loading...