optimal access/insertion associative array

T

Tom

Hi

I am looking for an optimal data-structure in order to replace a
map<int,float>.
I use it as some kind of sparse array representation.

The facts:
- the population in the data-structures can be bigger than 10^6
- the maximum number of elements (upper bound) is known and fixed
- the key is an integer, the element is a float value
- I need to do lots of access/insertion operations, but I don't care
about their ordering

Any help is welcome.
If this post is not adapted to this group, please let me know.
Thanks.

Tom
 
V

Victor Bazarov

Tom said:
I am looking for an optimal data-structure in order to replace a
map<int,float>.
I use it as some kind of sparse array representation.

The facts:
- the population in the data-structures can be bigger than 10^6

1.1 * 10^6 is bigger, and so is 10^100. How much bigger are we talking
about?
- the maximum number of elements (upper bound) is known and fixed

And how does it relate to the size of the population? Is it known at
compile-time or at run-time?
- the key is an integer, the element is a float value
- I need to do lots of access/insertion operations, but I don't care
about their ordering

Have you looked at a hash_map?

V
 
T

Tom

Thanks for the quick answer Victor
1.1 * 10^6 is bigger, and so is 10^100. How much bigger are we talking about?

Let's say the upper bound is 10^9. But it is very often comprised
between 10^5 and 10^6.
And how does it relate to the size of the population?

it is the same.
Is it known at compile-time or at run-time?

the upper bound is known at run-time.
Have you looked at a hash_map?

Not yet because it seems <hash_map> is not available with Visual C++.
 
V

Victor Bazarov

Tom said:
Thanks for the quick answer Victor




Let's say the upper bound is 10^9. But it is very often comprised
between 10^5 and 10^6.

Does a single collection ever grow? Do you know how many elements are
going to be in a particular collection before you start inserting?
it is the same.

I don't understand then. You said the array is sparse. In that case you
either don't want to keep the extra values (which would make the actual
population much smaller than the upper limit of the "index" or "key") or
you need to somehow indicate that a certain value is "missing". So, which
is it? Say, you have the upper value 100, and the population is no larger
than 10. You could then use an array of 100 values out of which only up
to 10 values would be valid, and the validity you could verify through
another (small) array of "valid indices". I don't think this is a good
idea because if I understood correctly, you would need an array of 10^9 FP
values, which is rather big, when only using about 10^5 to 10^6 of them.

Storing only as many values as you have in the "population" requires that
the "array" is sorted for quick retrieval. That's what 'map' gives you,
kind of. BTW, what's wrong with 'map'? Is it not fast enough?

Another requirement you left unspecified: does your collection ever reach
the point of "no need to change any longer", after which you only retrieve
values and don't insert new ones? Do you ever need to remove any values
from the collection?
the upper bound is known at run-time.

What's your definition of "upper bound"? I thought that when you say 10^9
that means that 1000000000 _is_ the upper bound.
Not yet because it seems <hash_map> is not available with Visual C++.

Right, it isn't. You need to get the SGI's version of STL or find
a hash_map implementation in some other library (Boost probably has it).

V
 
T

Tom

Does a single collection ever grow? Do you know how many elements are going to be in a particular collection before you start inserting?

The collection is empty when I start inserting/accessing. And it always
grow to 10^5-10^6.
And how does it relate to the size of the population?

Correction: the size (upper bound) of the population is around 10^9,
but I will only visit 10^6, which is the reason I want a sparse
representation.
does your collection ever reach the point of "no need to change any longer", after which you only retrieve values and don't insert new ones?

Yes definitely. I will need to access a fraction of the elements that
have been inserted (1% of them). Using the maps for that is efficient
enough.
Do you ever need to remove any values from the collection?
Never.

BTW, what's wrong with 'map'? Is it not fast enough?

It's fast, but I wish I could make it even faster. Inserting points in
the map cost me ~30% of my program's time.
If I can reduce it, it will allow me to work with bigger datasets and
to use several of these data-structures at the same time.

Thanks,

Tom
 
V

Victor Bazarov

Tom said:
[...]
BTW, what's wrong with 'map'? Is it not fast enough?


It's fast, but I wish I could make it even faster. Inserting points in
the map cost me ~30% of my program's time.
If I can reduce it, it will allow me to work with bigger datasets and
to use several of these data-structures at the same time.

I don't know what else to recommend you. I've recently had to change from
storing some [relatively small] objects in a 'std::set' and looking them
up in it, to a different approach altogether. I managed to speed up some
part of my program by more than 100-fold. The main change was to remove
a lookup and instead do a sequential processing of data, which were stored
elsewhere anyway. I am not saying that it should necessarily help in your
case, but try to look at the problem from a different point of view. What
if instead of storing all those values you try to recalculate them, or is
it definitely impossible? What if instead of trying to speed up storing
and retrieval you try to speed up recalculation? Maybe cache some values
to do so... Anyway, think outside the box.

V
 
E

E. Robert Tisdale

Tom said:
I am looking for an optimal data-structure
in order to replace a map<int,float>.
I use it as some kind of sparse array representation.

The facts:
- the population in the data-structures can be bigger than 10^6
- the maximum number of elements (upper bound) is known and fixed
- the key is an integer, the element is a float value
- I need to do lots of access/insertion operations, but I don't care
about their ordering

Any help is welcome.

How is this "data-structure" different from a map?
If it is a map, then std::map<int, float>
is probably as close to optimal as you can get.
 
T

Tom

I use this datastructure to compute a wave propagation in part of a
multidimensional domain so recalculation would involves the neighboring
values. It will definitely be too much
But thank you for your comments and suggestions.
 
T

Tom

This is what I am currently using.
I wanted to know if the constraints (integer key, upper bound on the
key, etc...) could lead to a faster implementation.
Thanks.
 
E

E. Robert Tisdale

Tom said:
This is what I am currently using.
I wanted to know if the constraints
(integer key, upper bound on the key, etc...)
could lead to a faster implementation.

It sounds like a Quality of Implementation (QoI) issue.
Tell us which [compiler] implementation you are using
and maybe we can redirect you to an appropriate forum.
 
K

Kelly Walker

Tom said:
Not yet because it seems <hash_map> is not available with Visual C++.
Visual C++ 7 (.NET) has <hash_map> with hash_map declared in namespace
stdext.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top