STL map to STL vector

S

Seungbeom Kim

It isn't very descriptive. It's like saying a
marble is not blue. Hash_map would be better.

A problem I see is that the name "unordered_map" strongly suggests
that a map should be ordered by default. While it's true that order
is an important property for std::map, a map in its purest sense
doesn't have to be ordered and it would have been no less sensible
to have std::map (unordered) and std::eek:rdered_map instead. It's just
for a historical reason that an ordered map took the name "std::map".

The same goes for unordered_set: we all learned in elementary schools
that a set is just a collection of things like {apple, banana, orange},
which never has to be ordered. (Letting multiset carry the burden of
the prefix "multi-" makes sense because its behavior deviates from
that of a "normal set".)

And I don't think being unordered is the most important and essential
property for std::unordered_{map,set}. A negative description such as
"not xxx" or "un-xxx" usually isn't; std::unordered_{map,set} is not
what we get just by removing order from std::{map,set}. This makes
a big point that makes the names std::unordered_{map,set} lame.
I think there was some history with that name,
but think the name could still be changed to
hash_map.

I don't believe it can happen. I don't like the choice, but I admit
it's made by the committee members who are in a better position to
make good decisions for the entire C++ community.
 
Ö

Öö Tiib

I believe unordered_map has to have a hash function.
It defaults to std::hash if you don't supply your own.

Yes. Also you can provide function to detect equality
of value of two keys and you can dynamically 'rehash'
and set load factor of hash table. For majority of use
cases those opportunities are unimportant. Majority
just need a map for searching mapping between two
values. They don't care about order of iteration.
The Microsoft documentation on unordered_map says the
sequence is "weakly ordered". That's a more accurate
description than "unordered". But I don't like
"weakly_ordered_map" for a name either. "Hash_map"
is nice because it isn't so long.

The "order" we talk of here is iteration order that is
"unordered" since it is not related to order of keys
unlike with 'std::map'. The keys of 'std::unordered_map'
may be even not comparable with each other at all. As
long a functor to detect equality of two keys is available
it works fine.
What different requirements?

Depends what 'hash_map' you are talking about. It was
very common extension provided by many library
implementations prior to standardising. Code that used
some particular implementation could continue to use
it without fear it changing or switch to standardised
'unordered_map' if there was budget for that switch.
 
Ö

Öö Tiib

Some containers you rarely/never need to know their size.

The C++ standard committee that wrote C++98 also thought
so. However in practice there was tons of newbies who wrote
things like 'if(x.size() == 0)' instead of 'if(x.empty())'. That is
terrible performance hit if a set counts its elements each time
someone detects its emptiness but the bug is so subtle and
common so the library implementers started to cache the size.
The committee then wizened up and standardised practice
that existed.
 
S

Seungbeom Kim

Why? "Hash" sounds like Klingon/Orcish word
while every child understands "unordered".

Maybe some children would understand "unordered" as the
status of their rooms with toys lying all around... :)

You need to understand hash containers and hash functions anyway
to be able to use std::unordered_map effectively, especially on
your own data type, so having "hash" in the name shouldn't be
too limiting, no matter how it sounds (especially to children).
On most cases the fact that unordered map is
built internally on hash table is as unimportant
as the fact that set is built upon red-black tree.

I consider hashing is an important concept in std::unordered_map,
as I said above. I wouldn't go as far as red-black trees, but actually
I think I like the name "tree_map"; given the complexity requirements,
it's almost as if the standard is written with trees as the underlying
data structure in mind.

The balance in the names "tree_map" and "hash_map" is an added bonus. :)
 
W

woodbrian77

Maybe some children would understand "unordered" as the
status of their rooms with toys lying all around... :)

You need to understand hash containers and hash functions anyway
to be able to use std::unordered_map effectively, especially on
your own data type, so having "hash" in the name shouldn't be
too limiting, no matter how it sounds (especially to children).

I've been trying to say something like that, but you beat
me to it so now I don't have to.
I consider hashing is an important concept in std::unordered_map,
as I said above. I wouldn't go as far as red-black trees, but actually
I think I like the name "tree_map"; given the complexity requirements,
it's almost as if the standard is written with trees as the underlying
data structure in mind.

That sounds interesting.

Brian
Ebenezer Enterprises
http://webEbenezer.net
 
W

woodbrian77

I don't think the standard demands the implementation to be a hash array
(even though it demands a hash function to be provided.) The name would
be misleading if the internal implementation happened to be something else.

Are there any implementations that do something else?
Consistency would dictate std::map to be renamed to std::eek:rdered_map,
but backwards compatibility and grandfather clauses...

A new name could be introduced without forcing a renaming.
There would be two names for the same thing with one being
the hoped-to-be improved name.
 
S

Seungbeom Kim

I don't think the standard demands the implementation to be a hash array
(even though it demands a hash function to be provided.) The name would
be misleading if the internal implementation happened to be something else.

I agree that standard names should not depend on implementation details,
but what would be so misleading in the name "hash_map" for something
that "by specification" requires and is organized by a hash function?

Besides, I'm curious to hear what data structures other than hash tables
exist that use a hash function and buckets but don't deserve the word
"hash" in the name.
 
A

Alf P. Steinbach

I don't see Mr Flibble's original article and I suspect that the context
may narrow down what's being discussed, but even as a general statement
that above is correct when just the word "relative" is added or understood.

Do you have a source for that?

Instead of authority, try logic.

It is after all the greatest authority.

The C++ standard guarantees amortized linear time, O(n), for creating a
vector of n items, i.e. amortized constant time per item. The O(n) time
includes O(n) item moves or copies, plus O(log n) buffer allocations.
The latter contribution diminishes greatly, relative to the total, as n
increases.

My understanding is that it will expand the vector - usually by doubling
in size or some such - whenever it gets full.

Yes. This is what guarantees amortized linear time. It might be a
constant factor such as 1.6 instead of 2.0, but assuming a doubling and
starting at buffer size 1, then for n elements you get copy/move time
consumption 1 + 2 + 4 + 8 + ... N, where N is the highest power of 2
that is less than or equal to n, which sum equals 2*N - 1.

This will require a copy
of every single item (1). Worth avoiding if you can.

A copy or move. But I agree, it's worth avoiding, e.g. by doing a
reserve, as a matter of course. Still it is true that the relative
benefit of doing that /diminishes/ as Mr. Flibble, alias Leigh "Sausage"
Johnston, wrote (or intended to write, I added "relative").


Cheers & hth.,

- Alf
 
Ö

Öö Tiib

Maybe some children would understand "unordered" as the
status of their rooms with toys lying all around... :)

Yes and most of the population considers strings as women
underwear. :) I meant children who consider writing software.
On 95% of cases it does not matter to user that it is internally
"splay_tree_multiset" or "hash_table_dictionary" or
"copy_on_write_string". "unordered" is better since it is adjective but
ideal would be to have just "multiset", "dictionary" or "string".
You need to understand hash containers and hash functions anyway
to be able to use std::unordered_map effectively, especially on
your own data type, so having "hash" in the name shouldn't be
too limiting, no matter how it sounds (especially to children).

All standard library containers have allocator parameter. Most
users of those containers are not too good in memory
management and incapable to develop efficient allocators
because they rarely need to. Same is with hashing. Most map
keys actually used are already hashable. On rare case when
compiler tells that it can't find 'hash<Key>' but it needs one the
people google a bit and write one rather quickly. IOW I don't
see that one must understand details of algorithm or container
overly deeply to use it effectively.
I consider hashing is an important concept in std::unordered_map,
as I said above. I wouldn't go as far as red-black trees, but actually
I think I like the name "tree_map"; given the complexity requirements,
it's almost as if the standard is written with trees as the underlying
data structure in mind.

The balance in the names "tree_map" and "hash_map" is an added bonus. :)

I think it will be in the future that the types will be (possibly
automatically) configurable. There are some class libraries with heavily
configurable traits already but it is not too convenient to tinker there
(and diagnostics can be ridiculous) right now. That will likely improve
when concepts emerge.
 
W

woodbrian77

That doesn't matter, it's what the key_equal does that counts.

You'll still be able to store objects if your hash just returns zero -
it's for efficiency only that you want to minimise hash clashes.

I'd consider that to be an efficiency bug.

Brian
Ebenezer Enterprises
http://webEbenezer.net
 
W

woodbrian77

The C++ standard committee that wrote C++98 also thought
so. However in practice there was tons of newbies who wrote
things like 'if(x.size() == 0)' instead of 'if(x.empty())'. That is
terrible performance hit if a set counts its elements each time
someone detects its emptiness but the bug is so subtle and
common so the library implementers started to cache the size.
The committee then wizened up and standardised practice
that existed.

It seems there were some sadists on the committee --
"What is is right."

Oh well, we have alternatives beyond committee's ideas.
For example, if you are working on on line services,
you may have a lot of freedom in terms of what libraries
you use in the back end of your service.


Brian
Ebenezer Enterprises - In G-d we trust.
http://webEbenezer.net
 
S

Seungbeom Kim

Does the standard require for std::unordered_map to use the provided
hash function for something?

Almost. Yes practically. 23.2.5[unord.req]/9:
"Keys with the same hash code appear in the same bucket."

Technically, an unordered associative container may avoid using the
hash function without violating that requirement if it chooses to use
a single bucket (even as it creates more buckets to lower the "nominal"
load factor), but it was never meant to be used that way, and it would
be a terrible QoI issue.

Anyway, my point was that the standard requires in the interface
specification the existence of something called a "hash function" and
that the word "hash" doesn't just refer to an implementation detail.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top