L
Lambda
I'm trying to develop several interesting components
of a simple search engine follow some text books.
A book introduces some ways to compress dictionary,
allow it to stay on main memory.
In the dictionary, is a list of millions of words
with related data of several bytes.
Word / Term Freq(int) / Pointer to disk file
Hello 100
One method is:
concatenate all the words into one long contiguous string
and an array of 4-byte character pointers is used to access.
But I think STL map is a suitable data structure for this problem.
I don't know the detailed internal implementation of map.
I think it's implemented as a binary search tree.
Every node occupy just enough memory. (I'm not sure)
I also need the search feature of BST.
So I'd like to know if it is a need to build the data structure
myself.
How can I know the whole memory occupied by a map,
as well as other data structures such as vector, set.
(Or memory occupied by a map element).
BTW, should I use std::string or char[] to store the millions of
words?
How much is the string overhead?
of a simple search engine follow some text books.
A book introduces some ways to compress dictionary,
allow it to stay on main memory.
In the dictionary, is a list of millions of words
with related data of several bytes.
Word / Term Freq(int) / Pointer to disk file
Hello 100
One method is:
concatenate all the words into one long contiguous string
and an array of 4-byte character pointers is used to access.
But I think STL map is a suitable data structure for this problem.
I don't know the detailed internal implementation of map.
I think it's implemented as a binary search tree.
Every node occupy just enough memory. (I'm not sure)
I also need the search feature of BST.
So I'd like to know if it is a need to build the data structure
myself.
How can I know the whole memory occupied by a map,
as well as other data structures such as vector, set.
(Or memory occupied by a map element).
BTW, should I use std::string or char[] to store the millions of
words?
How much is the string overhead?