G
Gustavo Rondina
Hello all,
First of all, this might be somewhat off-topic for some of you, so let
me
apology in advance.
I am writing a program that requires a huge 2D symmetric matrix of
booleans
(i.e., either 0 or 1). My first thought was to store it in a packed
way, so
the memory needed would be about half of the amount required to store
the
whole array in a traditional fashion. My packing is described as
follows.
Given this symmetric matrix:
0 1 2 3 4
+---+---+---+---+---+
0 | 1 | 0 | 0 | 1 | 1 |
+---+---+---+---+---+
1 | 0 | 0 | 1 | 1 | 0 |
+---+---+---+---+---+
2 | 0 | 1 | 1 | 1 | 1 |
+---+---+---+---+---+
3 | 1 | 1 | 1 | 1 | 1 |
+---+---+---+---+---+
4 | 1 | 0 | 1 | 1 | 0 |
+---+---+---+---+---+
I am storing only the upper triangular portion plus the diagonal
continuously
in memory, so I have:
00 01 02 03 04 11 12 13 14 22 23 24 33 34 44
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
The two digits numbers above the schematic correspond to the indexes
in the
original matrix. Assuming each position is one char, instead of N^2
bytes the
packed way is using only N*(N+1)/2 bytes. The trade-off is of course
having a
more costly way to access the matrix. Instead of simply doing M[J],
I need
a function like this:
#define N 10000 /* size of matrix */
...
1 unsigned char is_set(unsigned int i, unsigned int j)
2 {
3 unsigned int tmp;
4 unsigned int real_index;
5
6 if (i > j) {
7 tmp = j;
8 j = i;
9 i = tmp;
10 }
11
12 real_index = (i * N) - (i * (i - 1) / 2) + j - i;
13 return packed_array[real_index];
14 }
I came up with the formula to convert the indexes from the traditional
array
to the packed array and it seems to work.
Then I realized it is a waste to use a char to store a single
information, so
I decided to the bits of a char to do this, then instead of using N
(N-1)/2
bytes, I would need only 1/8 of this memory. Of course this leads to
bit
manipulation and all. The final result, with a few tweaks, is the
following
function:
1 unsigned char is_set(int i, int j)
2 {
3 register unsigned int idx;
4
5 if (i > j) {
6 register int temp = i;
7 i = j;
8 j = temp;
9 }
10
11 idx = ((i * ((N << 1) - i - 1)) >> 1) + j;
16 return ((packed_array[idx >> 3] << (idx & 7)) & (unsigned
char) 0x80);
17 }
(I reordered the expression to reduce the number of multiplications
and use
shift operations.)
This function is called a lot of times, and this is having some
noticeable
impact in the overall performance. The memory saving is really huge,
but
the performance is making me wonder if this is really the best choice.
Does anyone know how could I make it faster, or a clever way to solve
this
problem, or any suggestions on the code I posted above? Only
limitation
is that they only work properly if a char is 8 bits long, but I can
live with that.
Thanks,
Gustavo
PS. Sorry if the schematics look messed up, with a monospace font they
looked nice.
First of all, this might be somewhat off-topic for some of you, so let
me
apology in advance.
I am writing a program that requires a huge 2D symmetric matrix of
booleans
(i.e., either 0 or 1). My first thought was to store it in a packed
way, so
the memory needed would be about half of the amount required to store
the
whole array in a traditional fashion. My packing is described as
follows.
Given this symmetric matrix:
0 1 2 3 4
+---+---+---+---+---+
0 | 1 | 0 | 0 | 1 | 1 |
+---+---+---+---+---+
1 | 0 | 0 | 1 | 1 | 0 |
+---+---+---+---+---+
2 | 0 | 1 | 1 | 1 | 1 |
+---+---+---+---+---+
3 | 1 | 1 | 1 | 1 | 1 |
+---+---+---+---+---+
4 | 1 | 0 | 1 | 1 | 0 |
+---+---+---+---+---+
I am storing only the upper triangular portion plus the diagonal
continuously
in memory, so I have:
00 01 02 03 04 11 12 13 14 22 23 24 33 34 44
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
The two digits numbers above the schematic correspond to the indexes
in the
original matrix. Assuming each position is one char, instead of N^2
bytes the
packed way is using only N*(N+1)/2 bytes. The trade-off is of course
having a
more costly way to access the matrix. Instead of simply doing M[J],
I need
a function like this:
#define N 10000 /* size of matrix */
...
1 unsigned char is_set(unsigned int i, unsigned int j)
2 {
3 unsigned int tmp;
4 unsigned int real_index;
5
6 if (i > j) {
7 tmp = j;
8 j = i;
9 i = tmp;
10 }
11
12 real_index = (i * N) - (i * (i - 1) / 2) + j - i;
13 return packed_array[real_index];
14 }
I came up with the formula to convert the indexes from the traditional
array
to the packed array and it seems to work.
Then I realized it is a waste to use a char to store a single
information, so
I decided to the bits of a char to do this, then instead of using N
(N-1)/2
bytes, I would need only 1/8 of this memory. Of course this leads to
bit
manipulation and all. The final result, with a few tweaks, is the
following
function:
1 unsigned char is_set(int i, int j)
2 {
3 register unsigned int idx;
4
5 if (i > j) {
6 register int temp = i;
7 i = j;
8 j = temp;
9 }
10
11 idx = ((i * ((N << 1) - i - 1)) >> 1) + j;
16 return ((packed_array[idx >> 3] << (idx & 7)) & (unsigned
char) 0x80);
17 }
(I reordered the expression to reduce the number of multiplications
and use
shift operations.)
This function is called a lot of times, and this is having some
noticeable
impact in the overall performance. The memory saving is really huge,
but
the performance is making me wonder if this is really the best choice.
Does anyone know how could I make it faster, or a clever way to solve
this
problem, or any suggestions on the code I posted above? Only
limitation
is that they only work properly if a char is 8 bits long, but I can
live with that.
Thanks,
Gustavo
PS. Sorry if the schematics look messed up, with a monospace font they
looked nice.