Algorithm that makes maximum compression of completly diffused data.

C

Chris Angelico

I think the idea is that you could take any arbitrary input sequence,
view it as a large number, and then find what exponential equation can
produce that result. The equation becomes the "compression".

Interesting idea, but I don't see how this has anything to do with
compressing arbitrary data. We already have a compact notation for
representing a large number as 2^24*x+2^16*y+2^8*z+q, so I don't see
how this will be any better, other than eliding NULs.

ChrisA
 
J

jonas.thornvall

Den torsdagen den 7:e november 2013 kl. 23:26:45 UTC+1 skrev Chris Angelico:
Interesting idea, but I don't see how this has anything to do with

compressing arbitrary data. We already have a compact notation for

representing a large number as 2^24*x+2^16*y+2^8*z+q, so I don't see

how this will be any better, other than eliding NULs.



ChrisA

I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?
 
C

Chris Angelico

I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?

I don't care how fast. I care about the laws of physics :) You can't
stuff more data into less space without losing some of it.

Also, please lose Google Groups, or check out what other people have
said about making it less obnoxious.

ChrisA
 
M

Mark Janssen

I don't care how fast. I care about the laws of physics :) You can't
stuff more data into less space without losing some of it.

Technically, the universe could expand temporarily or reconfigure to
allow it; the question is who or what will have to shift out to allow
it?
 
J

jonas.thornvall

Den fredagen den 8:e november 2013 kl. 03:17:36 UTC+1 skrev Chris Angelico:
I don't care how fast. I care about the laws of physics :) You can't

stuff more data into less space without losing some of it.



Also, please lose Google Groups, or check out what other people have

said about making it less obnoxious.



ChrisA

Please, you are he obnoxious, so **** off or go learn about reformulation of problems. Every number has an infinite number of arithmetical solutions. So every number do has a shortest arithmetical encoding. And that is not the hard part to figure out, the hard part is to find a generic arithmetic encoding.

I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8=65536.
 
R

rusi

Den fredagen den 8:e november 2013 kl. 03:17:36 UTC+1 skrev Chris Angelico:
Please, you are he obnoxious, so **** off or go learn about
reformulation of problems. Every number has an infinite number of
arithmetical solutions. So every number do has a shortest
arithmetical encoding. And that is not the hard part to figure out,
the hard part is to find a generic arithmetic encoding.

Since we are discussing profound matters such as cosmic laws,
here's my 'all-universal' law:

When you engage with a troll you get trolled.
 
M

Mark Janssen

I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8=65536.
I can see that 4^8 = 65536. Now how are you going to render 65537? You
claimed that you could render *any* number efficiently. What you've
proven is that a small subset of numbers can be rendered efficiently.

I think the idea would be to find the prime factorization for a given
number, which has been proven to be available (and unique) for any and
every number. Most numbers can compress given this technique. Prime
numbers, of course, would not be compressed.
 
C

Chris Angelico

I think the idea would be to find the prime factorization for a given
number, which has been proven to be available (and unique) for any and
every number. Most numbers can compress given this technique. Prime
numbers, of course, would not be compressed.

Also an interesting theory.

1) How do you represent the prime factors?
2) How do you factor large numbers efficiently? Trust me, if you've
solved this one, there are a *LOT* of people who want to know. People
with money, like Visa.
3) Still doesn't deal with all possible numbers.

ChrisA
 
R

Roy Smith

Chris Angelico said:
2) How do you factor large numbers efficiently? Trust me, if you've
solved this one, there are a *LOT* of people who want to know. People
with money, like Visa.

Not to mention the NSA.
 
C

Chris Angelico

Chris's point is more subtle: the typical computer will store the number 65536 in a single byte, but it will also store 4 and 8 in one byte. So if your choice is between sending "65536" and "(4,8)", you actually loose efficiency in your scheme. Don't think in decimal, but in terms of information needing transfer.

Well, 65536 won't fit in a single byte, nor even in two (only just). A
typical binary representation of 65536 would take 3 bytes, or 4 for a
common integer type: 0x00 0x01 0x00 0x00 (big-endian). So it would be
possible to represent that more compactly, if you deem one byte each
for the base and the exponent: 0x04 0x08. However, that system allows
you to represent just 62,041 possible numbers:
for exp in range(256):
decomp[base**exp]=base,exp
62041

The trouble is, these numbers range all the way up from 0**0 == 0 to
255**255 == uhh...46531388344983681457769984555620005635274427815488751368772861643065273360461098097690597702647394229975161523887729348709679192202790820272357752329882392140552515610822058736740145045150003072264722464746837070302159356661765043244993104360887623976285955058200326531849137668562738184397385361179287309286327712528995820702180594566008294593820621769951491324907014215176509758404760451335847252744697820515292329680698271481385779516652518207263143889034764775414387732372812840456880885163361037485452406176311868267428358492408075197688911053603714883403374930891951109790394269793978310190141201019287109375

which would decompress to (obviously) 255 bytes. So you can represent
just over sixty thousand possible 255-byte strings in two bytes with
this system.

To generalize it, you'd need to devote a bit somewhere to saying
"There are more to add in". Let's say you do this on the exponent byte
(high bit for convenience), so you have 0x04 0x08 means 65536, and
0x04 0x88 0x01 0x01 means 65536+1 = 65537. Now we have something that
generalizes; but the efficiency is gone - and there are too many ways
to encode the same value. (Bear in mind, for instance, that 0x01 0xNN
for any NN will still just represent 1, and 0x00 0xNN will represent
0. That's wasting a lot of bits.)

The application I can see for this sort of thing is not data
compression, but puzzles. There are lots of puzzles that humans find
enjoyable that represent an entire grid with less data than it
contains - for one popular example, look at Sudoku. You have a 9x9
grid where each cell could contain any of nine digits, which means a
theoretical information density of 9**81; but the entire grid can be
described by a handful of digits and heaps of blank space. This could
be a similarly-fun mathematical puzzle: 3359232 can be represented as
B1**E1 + B2**E2, where all numbers are single-digit. Find B1, B2, E1,
and E2. In this case, you're guaranteed that the end result is shorter
(four digits), but it's hardly useful for general work.

ChrisA
 
R

R. Michael Weylandt

Well, 65536 won't fit in a single byte, nor even in two (only just). A
typical binary representation of 65536 would take 3 bytes, or 4 for a
common integer type: 0x00 0x01 0x00 0x00 (big-endian).

Quite right -- meant C's int type, (machine word) not char. My mistake!

Michael
 
C

Chris Angelico

Quite right -- meant C's int type, (machine word) not char. My mistake!

Sure. Assuming at least 32-bit words, yes, that's correct. Of course,
this is still just proving that it's {possible, not possible} to
compress specific values, while the OP claimed to compress anything.

ChrisA
 
D

Dave Angel

Re: Algorithm that makes maximum compression of completly diffused
data.


I think the idea would be to find the prime factorization for a given
number, which has been proven to be available (and unique) for any and
every number. Most numbers can compress given this technique. Prime
numbers, of course, would not be compressed.

No, very few numbers can be compressed using this technique. And
primes of course expand greatly.
 
S

Steven D'Aprano

I think the idea would be to find the prime factorization for a given
number, which has been proven to be available (and unique) for any and
every number. Most numbers can compress given this technique.

Sadly, they would not.

Let's suppose you wish to compress the number 143. So you find the prime
factorization, 11*13=143. Have you compressed anything? No you have not
-- the original number, 143, fits in a single byte, while it requires
*two* bytes to deal with 11 and 13 (one byte for 11, and a second byte
for 13).

We can try a different strategy: while 143 requires a full eight bits
(one byte), both 11 and 13 require only four bits (one nybble) each. So
by hard coding our expectation that the result will be two nybbles, we
can avoid expanding the data and end up with exactly zero compression:

143 = binary 10001110 or eight bits in total
11, 13 = binary 1011 1101 or eight bits in total

But while that trick works for 143, it doesn't work for 154 (one byte,
binary 10011010) which has prime factorization 2*7*11 (three nybbles 0010
0111 1011). Even if you can somehow drop the leading zeroes, it requires
nine bits versus eight.

Suppose instead of using bytes, you use decimal digits. 11 and 13 use
four digits, while 143 uses only three. Likewise, 154 has three decimal
digits while 2*7*11 has four digits. In both cases, there is no
compression.

How about recording which prime number it is, instead of the actual value
of the prime? So 154=2*7*11 expands to the 1st, 4th and 5th prime, so
maybe we can record 1 4 5 instead of 2 7 11 and save some bits. (We'll
save more bits the larger the prime.) Of course, to reverse the
compression you need to keep a table of the prime numbers somewhere, and
that's going to be a lot bigger than the space you save...

Now, I accept that, possibly, with sufficient work we might be able to
find some cunning mechanism by which we can compress *any* integer value.
But it won't be the same mechanism for every value! For example, we might
compress (2**1000+1) using a run-length encoding of the binary bits,
while compressing 14629839399572435720381495231 as its prime
factorization (the 319th prime number, the 479th prime, the 499th prime
six times), and 10**1000 as an encoding of the decimal digits. But we
still need some sort of tag to distinguish between these different
compression schemes, and once you take into account the extra information
in the tag, there will be cases where some numbers aren't compressed at
all.

The ability to losslessly compress *everything* is sheer pseudo-
mathematics, up there with finding an exact rational value for pi, or
squaring the circle using only a compass and straight-edge. But the
ability to losslessly compress *useful data* is real, and such algorithms
operate by finding non-random patterns and redundant information in the
data.
 
S

Steven D'Aprano

Please, you are he obnoxious, so **** off

Pot, meet kettle.
I am not sure if it is just stupidness or laziness that prevent you from
seeing that 4^8=65536.

And what is it that prevents you from seeing that 4**8=65536 is
irrelevant to the compression of random data?

All your talk about "reformulation of problems", "infinite number of
arithmetical solutions" and "generic arithmetic encoding" is just a smoke
screen to distract from the fact that you don't, in fact, have a
compression algorithm that can losslessly compress arbitrary random data
of any length.

I am as sure as this as I am sure you don't have a perpetual motion
machine, a method for squaring the circle using only a compass and
straight-edge, or a fool-proof system for winning the lottery every week
without any chance of failure.
 
G

Gregory Ewing

Steven said:
Of course, to reverse the
compression you need to keep a table of the prime numbers somewhere,

That's not strictly necessary -- you could calculate them
as needed. It wouldn't be *fast*, but it would work...

You've got me thinking now about how viable a compression
scheme this would be, efficiency issues aside. I suppose
it would depend on things like the average density of primes
and the average number of prime factors a number has.
Any number theorists here?
 
C

Chris Angelico

You've got me thinking now about how viable a compression
scheme this would be, efficiency issues aside. I suppose
it would depend on things like the average density of primes
and the average number of prime factors a number has.
Any number theorists here?

Start by coming up with an efficient storage representation for the
primes. Don't forget that you can't use a bitfield, because eg 98 =
7*7*2, so you somehow need to represent that the 7 comes up twice. And
you can't limit the size of your primes (eg by representing 98 as
"\x07\x07\x02").

And that's without dealing with the major issue of _finding_ prime
factors for a large number.

ChrisA
 
J

jonas.thornvall

Den fredagen den 8:e november 2013 kl. 03:43:17 UTC+1 skrev zipher:
I think the idea would be to find the prime factorization for a given

number, which has been proven to be available (and unique) for any and

every number. Most numbers can compress given this technique. Prime

numbers, of course, would not be compressed.



--

MarkJ

Tacoma, Washington

3^2-2^2=5
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,090
Messages
2,570,603
Members
47,223
Latest member
smithjens316

Latest Threads

Top