Use 3 bytes to store integer in redis?

Haomai Wang · Sep 5, 2012

In the implement of ziplist in redis, author use a way to reduce space to store different size integer.
Such as the integer in the range of -8388608~8388607, ziplist use 3 bytes to store it avoid 4bytes.
The function is below:

static void zipSaveInteger(unsigned char *p, int64_t value, unsigned char encoding) {
....
} else if (encoding == ZIP_INT_24B) {
i32 = value<<8;
memrev32ifbe(&i32);
memcpy(p,((uint8_t*)&i32)+1,sizeof(i32)-sizeof(uint8_t));
}...

`p` is the memory stored to, `value` is the integer will be encoded, `encoding` is the encoding way to use determining the size 1 byte, 2bytes, 3bytes or 4bytes etc.
Here we can find if encode integer in 3 bytes, firstly right shift 8 bits. I can't understand why?
The page below may help understand the problem:
https://github.com/antirez/redis/issues/469
This issue shows author concerns the two-complement representation so right shift 8 bits.

Johann Klammer · Sep 5, 2012

Haomai said:
In the implement of ziplist in redis, author use a way to reduce space to store different size integer.
Such as the integer in the range of -8388608~8388607, ziplist use 3 bytes to store it avoid 4bytes.
The function is below:

static void zipSaveInteger(unsigned char *p, int64_t value, unsigned char encoding) {
...
} else if (encoding == ZIP_INT_24B) {
i32 = value<<8;
memrev32ifbe(&i32);
memcpy(p,((uint8_t*)&i32)+1,sizeof(i32)-sizeof(uint8_t));
}...

`p` is the memory stored to, `value` is the integer will be encoded, `encoding` is the encoding way to use determining the size 1 byte, 2bytes, 3bytes or 4bytes etc.
Here we can find if encode integer in 3 bytes, firstly right shift 8 bits. I can't understand why? It's a left shift.
The page below may help understand the problem:
https://github.com/antirez/redis/issues/469
This issue shows author concerns the two-complement representation so right shift 8 bits.

I think this is related to those 'arithmetic shifts', which are supposed
to preserve sign of two's complement numbers. Avoid it if you can. The
code looks highly suspicious. Especially the memcopy after that memrev.
The left shift will align the most significant bit of the 24 bit int to
the most sig. bit of the 32 bit integer. shifting back the 32 bit
_signed_ integer by 8 bits afterwards will give a signed 32 bit integer
the same value as the 24 bit signed integer. If they did not do it they
would get their 24 bits interpreted as unsigned...(0-16.7M)

Ike Naar · Sep 5, 2012

static void zipSaveInteger(unsigned char *p, int64_t value, unsigned
char encoding) {
...
} else if (encoding == ZIP_INT_24B) {
i32 = value<<8;
memrev32ifbe(&i32);
memcpy(p,((uint8_t*)&i32)+1,sizeof(i32)-sizeof(uint8_t));
}...

`p` is the memory stored to, `value` is the integer will be encoded,
`encoding` is the encoding way to use determining the size 1 byte,
2bytes, 3bytes or 4bytes etc.
Here we can find if encode integer in 3 bytes, firstly right shift 8
bits. I can't understand why?

It's not a right shift, it's a left shift.

The statement 'i32=value<<8' will store the three least significant
bytes of 'value' into the three most significant bytes of i32,
e.g. if 'value' contained 0x0000000000123456 then i32 will contain
0x12345600. If the machine uses big-endian representation for integers,
the statement 'memrev32ifbe(&i32)' reverts the bytes.
Now, the bytes in memory are, starting at address &i32:

0x00, 0x56, 0x34, 0x12

Then memcpy copies the second, third and fourth byte of the representation
in memory, in that order, to the three-byte buffer pointed at by p,
so that the end result is p[0]==0x56, p[1]==0x34, p[2]=0x12.

Nick Bowler · Sep 12, 2012

static void zipSaveInteger(unsigned char *p, int64_t value, unsigned
char encoding) {
...
} else if (encoding == ZIP_INT_24B) {
i32 = value<<8;
memrev32ifbe(&i32);
memcpy(p,((uint8_t*)&i32)+1,sizeof(i32)-sizeof(uint8_t));
}...

Click to expand...

[...]
The statement 'i32=value<<8' will store the three least significant
bytes of 'value' into the three most significant bytes of i32,
e.g. if 'value' contained 0x0000000000123456 then i32 will contain
0x12345600. If the machine uses big-endian representation for integers,
the statement 'memrev32ifbe(&i32)' reverts the bytes.
Now, the bytes in memory are, starting at address &i32:

0x00, 0x56, 0x34, 0x12

Then memcpy copies the second, third and fourth byte of the representation
in memory, in that order, to the three-byte buffer pointed at by p,
so that the end result is p[0]==0x56, p[1]==0x34, p[2]=0x12.

Holy smokes, that seems like an incredibly overcomplicated way of writing

p[0] = value & 0xff;
p[1] = (value >> 8) & 0xff;
p[2] = (value >> 16) & 0xff;

Store information in pointers	29	Apr 28, 2010
Practical packing for structs of bytes	12	Sep 17, 2010
how to define an 8 bit integer	21	Oct 25, 2008
gdb python how to output integer for examine memory	6	Mar 24, 2014
using pre-processor to count bits in integer types...	17	Jan 15, 2009
Way to encode bits in signed integer	2	Nov 4, 2009
Qsort() messing with my entire Code	0	Apr 25, 2022
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023

Use 3 bytes to store integer in redis?

Haomai Wang

Johann Klammer

Ike Naar

Nick Bowler

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads