there will be hundreds of millions of the structs with the thousands of
bitfields.
how about something that will compress the data before I store it into
berkelydb. that way i can use normal non bitfield variables ( like
normal poeple ). any suggestions ?
At this point we start getting into algorithms questions rather than
C-specific questions.
In order to provide you with the best advice about compressing the
data, we would have to know the relative number of times that the
data will be written and read, and some kind of information about the
relative amount of time "just being stored" against the time being
read. We would also need some idea about probability distributions
of the bits.
- If data is written once and then re-read many many times,
then it becomes cost effective to have the packing code "work hard",
to make the data compact and easy to read quickly even if computing
it the first time is a pain
- If data is mostly sitting around stored and not looked at very often,
then compactness becomes more important than read time
- If most bits are set the same way or if there would tend to be
large blocks with similar properties, then we might choose different
compression algorithms
But these are all matters of algorithms, and are probably best dealt
with in comp.compression.
I don't know what the data represents, so I'll go ahead and ask:
What is the importance that a particular bit retrieval be correct?
I ask because for -some- applications, efficiencies in time and
storage can be achieved by using probablistic storage rather than
guaranteed storage.
One example of this is the storage of words for spell checking
purposes: you don't need to store the words themselves, you only need
to store information -about- the words, and it does not have to
be fully accurate because the consequences of getting a spell
check slightly wrong are usually not extremely high. So one
approach used in the spell-checking example is to use several
(e.g., 6) different hash algorithms applied to the word, with each
hash algorithm outputting a bit position in the same array.
On storage of a new word, you set the bit at each of those positions.
To check a word you check each of those bit positions; if any of the
bits are *not* set then the word is not in the original list; if all
of the bits -are- set, then either the word was in the original list,
or else there was a probabilistic clash with a set of other words.
The larger the bit array you use, and the more hash functions you use,
the lower the chance of a "false positive" -- but that means
you can trade-off time (number of hashes processed) and space
(size of the bit array) against probability of accuracy (when the
consequences of inaccuracy are not severe.)
But that's an algorithm, and algorithms in general are
comp.algorithms