Use 3 bytes to store integer in redis?

H

Haomai Wang

In the implement of ziplist in redis, author use a way to reduce space to store different size integer.
Such as the integer in the range of -8388608~8388607, ziplist use 3 bytes to store it avoid 4bytes.
The function is below:

static void zipSaveInteger(unsigned char *p, int64_t value, unsigned char encoding) {
....
} else if (encoding == ZIP_INT_24B) {
i32 = value<<8;
memrev32ifbe(&i32);
memcpy(p,((uint8_t*)&i32)+1,sizeof(i32)-sizeof(uint8_t));
}...

`p` is the memory stored to, `value` is the integer will be encoded, `encoding` is the encoding way to use determining the size 1 byte, 2bytes, 3bytes or 4bytes etc.
Here we can find if encode integer in 3 bytes, firstly right shift 8 bits. I can't understand why?
The page below may help understand the problem:
https://github.com/antirez/redis/issues/469
This issue shows author concerns the two-complement representation so right shift 8 bits.
 
J

Johann Klammer

Haomai said:
In the implement of ziplist in redis, author use a way to reduce space to store different size integer.
Such as the integer in the range of -8388608~8388607, ziplist use 3 bytes to store it avoid 4bytes.
The function is below:

static void zipSaveInteger(unsigned char *p, int64_t value, unsigned char encoding) {
...
} else if (encoding == ZIP_INT_24B) {
i32 = value<<8;
memrev32ifbe(&i32);
memcpy(p,((uint8_t*)&i32)+1,sizeof(i32)-sizeof(uint8_t));
}...

`p` is the memory stored to, `value` is the integer will be encoded, `encoding` is the encoding way to use determining the size 1 byte, 2bytes, 3bytes or 4bytes etc.
Here we can find if encode integer in 3 bytes, firstly right shift 8 bits. I can't understand why? It's a left shift.
The page below may help understand the problem:
https://github.com/antirez/redis/issues/469
This issue shows author concerns the two-complement representation so right shift 8 bits.

I think this is related to those 'arithmetic shifts', which are supposed
to preserve sign of two's complement numbers. Avoid it if you can. The
code looks highly suspicious. Especially the memcopy after that memrev.
The left shift will align the most significant bit of the 24 bit int to
the most sig. bit of the 32 bit integer. shifting back the 32 bit
_signed_ integer by 8 bits afterwards will give a signed 32 bit integer
the same value as the 24 bit signed integer. If they did not do it they
would get their 24 bits interpreted as unsigned...(0-16.7M)
 
I

Ike Naar

static void zipSaveInteger(unsigned char *p, int64_t value, unsigned
char encoding) {
...
} else if (encoding == ZIP_INT_24B) {
i32 = value<<8;
memrev32ifbe(&i32);
memcpy(p,((uint8_t*)&i32)+1,sizeof(i32)-sizeof(uint8_t));
}...

`p` is the memory stored to, `value` is the integer will be encoded,
`encoding` is the encoding way to use determining the size 1 byte,
2bytes, 3bytes or 4bytes etc.
Here we can find if encode integer in 3 bytes, firstly right shift 8
bits. I can't understand why?

It's not a right shift, it's a left shift.

The statement 'i32=value<<8' will store the three least significant
bytes of 'value' into the three most significant bytes of i32,
e.g. if 'value' contained 0x0000000000123456 then i32 will contain
0x12345600. If the machine uses big-endian representation for integers,
the statement 'memrev32ifbe(&i32)' reverts the bytes.
Now, the bytes in memory are, starting at address &i32:

0x00, 0x56, 0x34, 0x12

Then memcpy copies the second, third and fourth byte of the representation
in memory, in that order, to the three-byte buffer pointed at by p,
so that the end result is p[0]==0x56, p[1]==0x34, p[2]=0x12.
 
N

Nick Bowler

static void zipSaveInteger(unsigned char *p, int64_t value, unsigned
char encoding) {
...
} else if (encoding == ZIP_INT_24B) {
i32 = value<<8;
memrev32ifbe(&i32);
memcpy(p,((uint8_t*)&i32)+1,sizeof(i32)-sizeof(uint8_t));
}...
[...]
The statement 'i32=value<<8' will store the three least significant
bytes of 'value' into the three most significant bytes of i32,
e.g. if 'value' contained 0x0000000000123456 then i32 will contain
0x12345600. If the machine uses big-endian representation for integers,
the statement 'memrev32ifbe(&i32)' reverts the bytes.
Now, the bytes in memory are, starting at address &i32:

0x00, 0x56, 0x34, 0x12

Then memcpy copies the second, third and fourth byte of the representation
in memory, in that order, to the three-byte buffer pointed at by p,
so that the end result is p[0]==0x56, p[1]==0x34, p[2]=0x12.

Holy smokes, that seems like an incredibly overcomplicated way of writing

p[0] = value & 0xff;
p[1] = (value >> 8) & 0xff;
p[2] = (value >> 16) & 0xff;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,961
Messages
2,570,130
Members
46,689
Latest member
liammiller

Latest Threads

Top