optimal access/insertion associative array

Tom · Aug 1, 2005

Hi

I am looking for an optimal data-structure in order to replace a
map<int,float>.
I use it as some kind of sparse array representation.

The facts:
- the population in the data-structures can be bigger than 10^6
- the maximum number of elements (upper bound) is known and fixed
- the key is an integer, the element is a float value
- I need to do lots of access/insertion operations, but I don't care
about their ordering

Any help is welcome.
If this post is not adapted to this group, please let me know.
Thanks.

Tom

Victor Bazarov · Aug 1, 2005

Tom said:
I am looking for an optimal data-structure in order to replace a
map<int,float>.
I use it as some kind of sparse array representation.

The facts:
- the population in the data-structures can be bigger than 10^6

1.1 * 10^6 is bigger, and so is 10^100. How much bigger are we talking
about?

- the maximum number of elements (upper bound) is known and fixed

And how does it relate to the size of the population? Is it known at
compile-time or at run-time?

- the key is an integer, the element is a float value
- I need to do lots of access/insertion operations, but I don't care
about their ordering

Have you looked at a hash_map?

V

Tom · Aug 1, 2005

Thanks for the quick answer Victor

1.1 * 10^6 is bigger, and so is 10^100. How much bigger are we talking about?

Let's say the upper bound is 10^9. But it is very often comprised
between 10^5 and 10^6.

And how does it relate to the size of the population?

it is the same.

Is it known at compile-time or at run-time?

the upper bound is known at run-time.

Have you looked at a hash_map?

Not yet because it seems <hash_map> is not available with Visual C++.

Victor Bazarov · Aug 1, 2005

Tom said:
Thanks for the quick answer Victor

Let's say the upper bound is 10^9. But it is very often comprised
between 10^5 and 10^6.

Does a single collection ever grow? Do you know how many elements are
going to be in a particular collection before you start inserting?

it is the same.

I don't understand then. You said the array is sparse. In that case you
either don't want to keep the extra values (which would make the actual
population much smaller than the upper limit of the "index" or "key") or
you need to somehow indicate that a certain value is "missing". So, which
is it? Say, you have the upper value 100, and the population is no larger
than 10. You could then use an array of 100 values out of which only up
to 10 values would be valid, and the validity you could verify through
another (small) array of "valid indices". I don't think this is a good
idea because if I understood correctly, you would need an array of 10^9 FP
values, which is rather big, when only using about 10^5 to 10^6 of them.

Storing only as many values as you have in the "population" requires that
the "array" is sorted for quick retrieval. That's what 'map' gives you,
kind of. BTW, what's wrong with 'map'? Is it not fast enough?

Another requirement you left unspecified: does your collection ever reach
the point of "no need to change any longer", after which you only retrieve
values and don't insert new ones? Do you ever need to remove any values
from the collection?

the upper bound is known at run-time.

What's your definition of "upper bound"? I thought that when you say 10^9
that means that 1000000000 _is_ the upper bound.

Not yet because it seems <hash_map> is not available with Visual C++.

Right, it isn't. You need to get the SGI's version of STL or find
a hash_map implementation in some other library (Boost probably has it).

V

Tom · Aug 1, 2005

Does a single collection ever grow? Do you know how many elements are going to be in a particular collection before you start inserting?

The collection is empty when I start inserting/accessing. And it always
grow to 10^5-10^6.

And how does it relate to the size of the population?

Correction: the size (upper bound) of the population is around 10^9,
but I will only visit 10^6, which is the reason I want a sparse
representation.

does your collection ever reach the point of "no need to change any longer", after which you only retrieve values and don't insert new ones?

Yes definitely. I will need to access a fraction of the elements that
have been inserted (1% of them). Using the maps for that is efficient
enough.

Do you ever need to remove any values from the collection?
Never.

BTW, what's wrong with 'map'? Is it not fast enough?

It's fast, but I wish I could make it even faster. Inserting points in
the map cost me ~30% of my program's time.
If I can reduce it, it will allow me to work with bigger datasets and
to use several of these data-structures at the same time.

Thanks,

Tom

Victor Bazarov · Aug 1, 2005

Tom said:
[...]

BTW, what's wrong with 'map'? Is it not fast enough?

Click to expand...

It's fast, but I wish I could make it even faster. Inserting points in
the map cost me ~30% of my program's time.
If I can reduce it, it will allow me to work with bigger datasets and
to use several of these data-structures at the same time.

I don't know what else to recommend you. I've recently had to change from
storing some [relatively small] objects in a 'std::set' and looking them
up in it, to a different approach altogether. I managed to speed up some
part of my program by more than 100-fold. The main change was to remove
a lookup and instead do a sequential processing of data, which were stored
elsewhere anyway. I am not saying that it should necessarily help in your
case, but try to look at the problem from a different point of view. What
if instead of storing all those values you try to recalculate them, or is
it definitely impossible? What if instead of trying to speed up storing
and retrieval you try to speed up recalculation? Maybe cache some values
to do so... Anyway, think outside the box.

V

E. Robert Tisdale · Aug 1, 2005

Tom said:
I am looking for an optimal data-structure
in order to replace a map<int,float>.
I use it as some kind of sparse array representation.

The facts:
- the population in the data-structures can be bigger than 10^6
- the maximum number of elements (upper bound) is known and fixed
- the key is an integer, the element is a float value
- I need to do lots of access/insertion operations, but I don't care
about their ordering

Any help is welcome.

How is this "data-structure" different from a map?
If it is a map, then std::map<int, float>
is probably as close to optimal as you can get.

Tom · Aug 2, 2005

I use this datastructure to compute a wave propagation in part of a
multidimensional domain so recalculation would involves the neighboring
values. It will definitely be too much
But thank you for your comments and suggestions.

Tom · Aug 2, 2005

This is what I am currently using.
I wanted to know if the constraints (integer key, upper bound on the
key, etc...) could lead to a faster implementation.
Thanks.

E. Robert Tisdale · Aug 2, 2005

Tom said:
This is what I am currently using.
I wanted to know if the constraints
(integer key, upper bound on the key, etc...)
could lead to a faster implementation.

It sounds like a Quality of Implementation (QoI) issue.
Tell us which [compiler] implementation you are using
and maybe we can redirect you to an appropriate forum.

Kelly Walker · Aug 2, 2005

Tom said:
Not yet because it seems <hash_map> is not available with Visual C++.

Visual C++ 7 (.NET) has <hash_map> with hash_map declared in namespace
stdext.

Associative Array in C	6	Mar 3, 2008
Associative array problem?	2	Sep 19, 2004
[ANN] Multidimensional Array - MDArray 0.5.5	0	Nov 19, 2013
data type choice	7	Aug 30, 2013
What about a new attribute to access the physical representation of a signal ?	2	Jun 29, 2012
How to access an Array and a struct within a struct wrapped by Swig ?	3	Aug 16, 2008
List as a dynamic array of increasing-sized arrays	13	Oct 21, 2010
'Needless flexibilities' and structured records [very long]	10	Mar 15, 2013

optimal access/insertion associative array

Tom

Victor Bazarov

Tom

Victor Bazarov

Tom

Victor Bazarov

E. Robert Tisdale

Tom

Tom

E. Robert Tisdale

Kelly Walker

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads