Algorithm that makes maximum compression of completly diffused data.

C

Chris Angelico

There is a way to apparently get around these limits: store data
externally, perhaps inside the compression application itself. Then, if
you just look at the compressed file (the "data.zip" equivalent, although
I stress that zip compression is *not* like this), you might think it has
shrunk quite a lot. But when you include the hidden data, the compression
is not quite so impressive...

Storing externally is, of course, a very useful thing - it's just not
compression. For instance, a git repository (and quite likely a
Mercurial one too) keeps track of files as blobs of data referenced by
their cryptographic hashes. The current state of a directory can be
described by listing file names with their object hashes, which is a
very compact notation; but it doesn't have _all_ the information, and
the "decompression" process involves fetching file contents from the
library. It's a tool in the arsenal, but it's not compression in the
same way that PK-ZIP is.

With real compression algorithms, there's usually an "out" clause that
detects that the algorithm's doing a bad job. The file format for
PK-ZIP and, I expect, every other archiving+compression file format,
has allowances for some of the files to be stored uncompressed. That
way, there's no need for any file to be enlarged by more than some
signature - say, a one-byte "compression type" marker, or even a
one-bit mark somewhere. But enlarged they must be, for the reasons
Steven explained.

ChrisA
 
E

Ethan Furman

Congratulations Jonas. My kill file for this list used to have only one
name, but now has 2.

You have more patience than I! Jonas just made mine seven. :)
 
M

Mark Janssen

Congratulations Jonas. My kill file for this list used to have only one
You have more patience than I! Jonas just made mine seven. :)

Gosh, don't kill the guy. It's not an obvious thing to hardly anyone
but computer scientists. It's an easy mistake to make.
 
G

Gene Heskett

You have more patience than I! Jonas just made mine seven. :)

Yeah, well there are a couple others in the mugwump category here yet. I
lurk here to try and learn, and baseless arguments are just noise. To be
filtered. And its working!

But it may be that this old dog has learned his last "new" trick in the
language arena too, too many "irons in the fire", and fine tuning machinery
to run the GCode I write to carve metal or wood is the primary interest
ATM. At 79yo, the short term memory needs help. I'm smart enough to
understand that, but it doesn't mean I like it. Its a right PIMA TBE.

Cheers, Gene
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

All the evidence concerning the universe has not yet been collected,
so there's still hope.
A pen in the hand of this president is far more
dangerous than 200 million guns in the hands of
law-abiding citizens.
 
M

Michael Torrie

Gosh, don't kill the guy. It's not an obvious thing to hardly anyone
but computer scientists. It's an easy mistake to make.

I don't think he's being plonked for not understanding computational
theory. He's being plonked for resorting to name calling on his second
post! If he was a smart computer scientist type, then engaging in a
discussion about the theoretical aspects of his algorithm would have
been welcomed by him, because that's what science is all about. But he
failed that early on.

Thanks to everyone in this part of the thread for turning this
ridiculous farce into a really educational discussion on the theory of
information compression. Too bad the OP has tuned out a long time ago.
 
J

Joshua Landau

Well, then, why don't you share it?

Let me try to get you to understand WHY what you say is impossible.
[snip reasons]

But your second reason, better known as the pigeonhole principle,
demonstrates that for any lossless compression method, there must be data
sets that actually expand the data. It doesn't matter how cleverly you
compress the data, you can't fit 20kg of potatoes in a 10kg bag, so to
speak. Suppose your compression algorithm compresses a single byte into a
nybble (four bits). There are 256 different input data sets (0x00,
0x01, ... 0xFF) and only 16 different outputs (0x0, 0x1, ... 0xF). There
is no way for 256 pigeons to fit in 16 pigeon holes unless you put two or
more pigeons in at least one hole. Ergo, if the compression algorithm is
lossless, *some* data must be expanded rather than compressed.

You have to be careful to make this totally rigorous, too.

Note that I *can* make a "compression" algorithm that takes any
length-n sequence and compresses all but one sequence by at least one
bit, and does not ever expand the data.

"00" -> ""
"01" -> "0"
"10" -> "1"
"11" -> "00"

This, obviously, is just 'cause the length is an extra piece of data,
but sometimes you have to store that anyway ;). So if I have a list of
N length-Y lists containing only 1s or 0s, I can genuinely compress
the whole structure by N log2 Y items.
 
M

Mark Janssen

Note that I *can* make a "compression" algorithm that takes any
length-n sequence and compresses all but one sequence by at least one
bit, and does not ever expand the data.

"00" -> ""
"01" -> "0"
"10" -> "1"
"11" -> "00"

This, obviously, is just 'cause the length is an extra piece of data,
but sometimes you have to store that anyway ;).

And how many bits will you use to store the length?
So if I have a list of
N length-Y lists containing only 1s or 0s, I can genuinely compress
the whole structure by N log2 Y items.

But you cheated by using a piece of information from "outside the
system": length. A generic compression algorithm doesn't have this
information beforehand.
 
T

Tim Chase

But you cheated by using a piece of information from "outside the
system": length. A generic compression algorithm doesn't have this
information beforehand.

By cheating with outside information, you can perfectly compress any
one data-set down to 1 bit. Just store the data in the program, then
store 1 bit of "is this file the data we have stored in the
program?". Granted, in modern OSes, you waste 7 extra bits since
they require you to write an entire byte at a time. :)

And with that, you could even have an empty file and test for a file
extension. ;-)

-tkc
 
J

jonas.thornvall

Den lördagen den 2:e november 2013 kl. 22:31:09 UTC+1 skrev Tim Roberts:
Well, then, why don't you share it?



Let me try to get you to understand WHY what you say is impossible. Let's

say you do have a function f(x) that can produce a compressed output y for

any given x, such that y is always smaller than x. If that were true, then

I could call f() recursively:

f(f(...f(f(f(f(f(x)))))...))

and eventually the result get down to a single bit. I hope it is clear

that there's no way to restore a single bit back into different source

texts.



Here's another way to look at it. If f(x) is smaller than x for every x,

that means there MUST me multiple values of x that produce the same f(x).

Do you see? If x is three bits and f(x) is two bits, that means there are

8 possible values for x but only 4 values for f(x). So, given an f(x), you

cannot tell which value of x it came from. You have lost information.

--

Tim Roberts, (e-mail address removed)

Providenza & Boekelheide, Inc.

Well let me try to explain why it is working and i have implemented one.
I only need to refresh my memory it was almost 15 years ago.
This is not the solution but this is why it is working.
65536=256^2=16^4=***4^8***=2^16

Yes i am aware that 256 is a single byte 8 bits, but the approach is valid anyway.
 
J

jonas.thornvall

Den måndagen den 4:e november 2013 kl. 14:53:28 UTC+1 skrev (e-mail address removed):
Den lördagen den 2:e november 2013 kl. 22:31:09 UTC+1 skrev Tim Roberts:




Well let me try to explain why it is working and i have implemented one.

I only need to refresh my memory it was almost 15 years ago.

This is not the solution but this is why it is working.

65536=256^2=16^4=***4^8***=2^16



Yes i am aware that 256 is a single byte 8 bits, but the approach is valid anyway.

You see if the program behave intelligent some small operations can performwonder on a number.
 
D

Dave Angel

Re: Algorithm that makes maximum compression of completly diffused
data.


Den lördagen den 2:e november 2013 kl. 22:31:09 UTC+1 skrev Tim Roberts:
every x,



same f(x).



there are



f(x), y=



information.




Well let me try to explain why it is working and i have implemented one.
I only need to refresh my memory it was almost 15 years ago.
This is not the solution but this is why it is working.
65536=256^2=16^4=***4^8***=2^16

Yes i am aware that 256 is a single byte 8 bits, but the approach is valid =
anyway.

And e ^ (I * pi) == -1
So what. ?

Better file that patent, before the patent office realizes the
analogy to the perpetual motion machine.
 
R

rusi

And e ^ (I * pi) == -1
So what. ?
Better file that patent, before the patent office realizes the
analogy to the perpetual motion machine.

Now I am too much of a dim-wit to comprehend all this compressified
profundification but I think I have a (dim) clue how such a patent
will be obtained:

Just make a submission in such an unreadable double (then triple,
quadruple...) spaced format that the patent office will grant it in
exasperation.
 
J

jonas.thornvall

Den måndagen den 4:e november 2013 kl. 15:27:19 UTC+1 skrev Dave Angel:
every x,










same f(x).










there are










f(x), y=











is valid =




And e ^ (I * pi) == -1

So what. ?

e is an approximation... and your idea is not general for any n.
 
D

Dave Angel

Re: Algorithm that makes maximum compression of completly diffused
data.


e is an approximation... and your idea is not general for any n.

e is certainly not an approximation, and I never mentioned n.
 
S

Steven D'Aprano

Den måndagen den 4:e november 2013 kl. 15:27:19 UTC+1 skrev Dave Angel:
[...]
This is not the solution but this is why it is working.
65536=256^2=16^4=***4^8***=2^16

"this" being Jonas' alleged lossless compression method capable of
compressing random data.


I must say, I cannot see the connection between the fact that 256**2 ==
2**16 and compression of random data. I might as well state that I have
squared the circle, and offer as proof that 3+4 == 5+2.


e is an approximation... and your idea is not general for any n.

e is not an approximation, it is a symbolic name for an exact
transcendental number which cannot be specified exactly by any finite
number of decimal places.

Your comment about "n" is irrelevant, since Euler's Identity e**(i*pi)=-1
has nothing to do with "n". But in case you actually meant "i", again you
are mistaken. i is a symbolic name for an exact number.
 
S

Steven D'Aprano

Den måndagen den 4:e november 2013 kl. 15:27:19 UTC+1 skrev Dave Angel:
[...]
This is not the solution but this is why it is working.

65536=256^2=16^4=***4^8***=2^16

"this" being Jonas' alleged lossless compression method capable of
compressing random data.

/facepalm

I meant "it", not "this", as in "why it (the compression method) is
working". Sorry for any confusion.
 
T

Tim Roberts

Well let me try to explain why it is working and i have implemented one.
I only need to refresh my memory it was almost 15 years ago.
This is not the solution but this is why it is working.
65536=256^2=16^4=***4^8***=2^16

All of those values are indeed the same, and yet that is completely
unrelated to compression. Did you honestly believe this was actually
explaining anything?
Yes i am aware that 256 is a single byte 8 bits, but the approach is valid anyway.

This whole post is a most amazing collection of non sequiturs. If you
would like to describe your compression scheme, there really are people
here who would be interested in reading it (although that number gets less
and less as this thread goes on).
 
M

Mark Janssen

Well let me try to explain why it is working and i have implemented one.
All of those values are indeed the same, and yet that is completely
unrelated to compression. Did you honestly believe this was actually
explaining anything?

I think the idea is that you could take any arbitrary input sequence,
view it as a large number, and then find what exponential equation can
produce that result. The equation becomes the "compression".

MarkJ
Tacoma, Washington
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,091
Messages
2,570,604
Members
47,223
Latest member
smithjens316

Latest Threads

Top