Overhead of subscript operator for STL maps

C

C++Liliput

I have a custom String class that contains an embedded char* member.
The copy constructor, assignment operator etc. are all correctly
defined. I need to create a map of my string (say a class called
MyString) and an integer i.e. std::map<MyString, int>. Whenever I
insert the elements in the map using the subscript [] operator, I
noticed that the copy constructor for MyString is invoked more number
of times than if I do it using the insert() operation. For e.g. for 3
strings, when I insert them into the map using subscript operator, the
copy constructor is invoked 6 times whereas in the case of insert(),
the copy constructor is only invoked 3 times. Can someone please
explain me the reason why this is so?
 
J

Jerry

I have a custom String class that contains an embedded char* member.
The copy constructor, assignment operator etc. are all correctly
defined. I need to create a map of my string (say a class called
MyString) and an integer i.e. std::map<MyString, int>. Whenever I
insert the elements in the map using the subscript [] operator, I
noticed that the copy constructor for MyString is invoked more number
of times than if I do it using the insert() operation. For e.g. for 3
strings, when I insert them into the map using subscript operator, the
copy constructor is invoked 6 times whereas in the case of insert(),
the copy constructor is only invoked 3 times. Can someone please
explain me the reason why this is so?

Where do you call the operator[]? I think paste your codes could make
the question clear.
 
M

Maxim Yegorushkin

C++Liliput said:
I have a custom String class that contains an embedded char* member.
The copy constructor, assignment operator etc. are all correctly
defined. I need to create a map of my string (say a class called
MyString) and an integer i.e. std::map<MyString, int>. Whenever I
insert the elements in the map using the subscript [] operator, I
noticed that the copy constructor for MyString is invoked more number
of times than if I do it using the insert() operation. For e.g. for 3
strings, when I insert them into the map using subscript operator, the
copy constructor is invoked 6 times whereas in the case of insert(),
the copy constructor is only invoked 3 times. Can someone please
explain me the reason why this is so?

The subscript operator creates a default constructed object then copies
the passed object to it.  So you will see a copy for the operator call
and one for the copy in.

The subscript operator creates a default constructed object only for
the *value*, but not for the key of std::map<>.

In fact, std::map<Key, Value> internally stores std::pair<Key, Value>.
When std::map<>::eek:perator[] is called and there has been no such key,
it creates a temporary std::pair<Key, Value> (where Value is default
initialised) and then copies that temporary std::pair<Key, Value> into
the newly created (tree) node, thus 2 calls to the copy constructor of
Key. std::map::insert(), on the other hand, accepts std::pair<Key,
Value>, which is used to initialise the node's copy of it, thus 1 copy
constructor call.
 
S

Stephen Horne

The subscript operator creates a default constructed object then copies
the passed object to it. So you will see a copy for the operator call
and one for the copy in.

Wow!

I always assumed that using [] for a key that isn't already in the
container would throw an exception. It seems like an obvious case of
most-likely-a-mistake to me.

I realise that scripting languages do it, but they get to handle the
[] differently depending on whether its on the left side of an
assignment or not. They still complain if you try to read a
non-existent key.

And even with the [] on the left of an assignment, I still think it's
a bit bad, since to me the obvious intent is to overwrite the data for
an existing key.

Sorry - obviously the rules are what they are - I'm just a bit
shocked.
 
E

Erik Wikström

The subscript operator creates a default constructed object then copies
the passed object to it. So you will see a copy for the operator call
and one for the copy in.

Wow!

I always assumed that using [] for a key that isn't already in the
container would throw an exception. It seems like an obvious case of
most-likely-a-mistake to me.

I realise that scripting languages do it, but they get to handle the
[] differently depending on whether its on the left side of an
assignment or not. They still complain if you try to read a
non-existent key.

And even with the [] on the left of an assignment, I still think it's
a bit bad, since to me the obvious intent is to overwrite the data for
an existing key.

Personally I consider it a good idea, it can simplify a lot of code
(such as this one, written by Fred Swartz):

#include <iostream>
#include <map>
#include <string>
using namespace std;

int main() {
map<string, int> freq; // map of words and their frequencies
string word; // input buffer for words.

//--- Read words/tokens from input stream
while (cin >> word) {
freq[word]++;
}

//--- Write the count and the word.
map<string, int>::const_iterator iter;
for (iter=freq.begin(); iter != freq.end(); ++iter) {
cout << iter->second << " " << iter->first << endl;
}
return 0;
}//end main

Notice how little code is dedicated to the actual counting of the words.
 
J

Juha Nieminen

Stephen said:
I realise that scripting languages do it, but they get to handle the
[] differently depending on whether its on the left side of an
assignment or not. They still complain if you try to read a
non-existent key.

Actually in many languages (such as for example PHP) relational maps
work exactly that way. That is, you add an element to the map by
"indexing" it with the key and assigning the element. This is a very
common idiom eg. in PHP.
And even with the [] on the left of an assignment, I still think it's
a bit bad, since to me the obvious intent is to overwrite the data for
an existing key.

It might be "obvious" to you because you are used to think like that.
However, in many languages (such as PHP) it's obvious that you are
building a relational map by indexing it. That is, when you say:

myMap[key] = value;

when you are doing is adding 'value' at the "position" 'key' of the map.
Or if there was already such a key, you are replacing its data with the
new value.

std::map has that exact same idea (although it's not as flexible as
PHP because the type of the key and the value are fixed). As exemplified
by Erik, this can be quite handy in some cases, for example because you
can write things like:

words[word]++;

One single command adds a new 'word' to the map and increments its value.

If you want to check if a key exists in the map, that's what the
find() member function is for.
 
J

James Kanze

On Thu, 16 Oct 2008 20:32:13 +1300, Ian Collins
The subscript operator creates a default constructed object
then copies the passed object to it. So you will see a copy
for the operator call and one for the copy in.
Wow!
I always assumed that using [] for a key that isn't already
in the container would throw an exception. It seems like an
obvious case of most-likely-a-mistake to me.
I realise that scripting languages do it, but they get to
handle the [] differently depending on whether its on the
left side of an assignment or not. They still complain if
you try to read a non-existent key.

Do they? The ones I use don't. (You can get this behavior in
C++, if you want it, by having operator[] return a proxy.)
And even with the [] on the left of an assignment, I still
think it's a bit bad, since to me the obvious intent is to
overwrite the data for an existing key.
Personally I consider it a good idea, it can simplify a lot of
code (such as this one, written by Fred Swartz):

Sure, it simplifies some things. And makes others more
complicated. The fundamental problem is that it means that
operator[] can't be used on a const map, which is exceedingly
constraining.
#include <iostream>
#include <map>
#include <string>
using namespace std;
int main() {
map<string, int> freq; // map of words and their frequencies
string word; // input buffer for words.
//--- Read words/tokens from input stream
while (cin >> word) {
freq[word]++;
}
//--- Write the count and the word.
map<string, int>::const_iterator iter;
for (iter=freq.begin(); iter != freq.end(); ++iter) {
cout << iter->second << " " << iter->first << endl;
}
return 0;
}//end main
Notice how little code is dedicated to the actual counting of
the words.

And consider the more common use of using an std::map to look up
a factory, dynamically. If the list of classes is fixed, the
map will typically be const. There's no problem with operator[]
returning a value constructed with the default constructor,
since the mapped_type is normally a pointer, and a null pointer
is a good signal for a missing element. But you really want the
instance to be const.
 
S

Stephen Horne

Stephen said:
I realise that scripting languages do it, but they get to handle the
[] differently depending on whether its on the left side of an
assignment or not. They still complain if you try to read a
non-existent key.

Actually in many languages (such as for example PHP) relational maps
work exactly that way. That is, you add an element to the map by
"indexing" it with the key and assigning the element. This is a very
common idiom eg. in PHP.

I didn't know that, and on my first read of this I thought I'd
overgeneralised a principle from basically just Python to scripting
languages more generally. On second read, you are saying precisely
what I said - PHP is a scripting language, and it is doing what I said
is characteristic of scripting languages.

Maybe I worded my post awkwardly. To clarify...

C++ allows key:data insertion using "container [key] = data;". That
surprised me. I realise that scripting languages *also* do "container
[key] = data;", but they get to handle the subscripting differently.

Basically, scripting languages don't do a two stage process of insert
a key:default in the [], then overwrite the default with the data in
the assign. They merge the subscripting and assign into a single step.
For example, Python has separate magic methods for overriding a
subscripted read and a subscripted write. When C++ handles the
subscripting operation it doesn't even know that there is an assign.
For example...

std::map<int,int> mymap;
int var;

var = mymap [1];

Did the programmer really intend for var to get a random junk value
that was also assigned into mymap in a key=1:data=junk pair? Python
would know better since the assignment is on the wrong side of the
assignment, I assume PHP would as well, but C++ cannot since when it
handles the operator[], the method doesn't know how the returned
reference will be used. It returns a reference that might be used for
a read or a write, or which may not be used at all.

The two normal variants of operator[] aren't read and write - they are
const and non-const as follows...

const data_t& container::eek:perator[] const { ... }
data_t& container::eek:perator[] { ... }

The first shouldn't modify the container, and returns a reference that
will normally prevent the referenced value from being modified (with
various get-out clauses - mutable, const_cast, ...). This version is
called whenever the container is considered const.

The second returns a non-const reference which is equally valid for
both reading or writing the referenced data. This version is called
whenever the container is considered non-const, which is no clue as to
whether a read or write is intended.

The semantics of the first case ensure a const container is safe - you
can't modify a const container so you can't insert a key:default pair
and I assume an exception is thrown in that case.

I should say that based purely on the method signature there is no
reason to ban inserts on subscripting - the container is non-const.
Even so, I feel that reserving the subscripting operator for cases
where the intent is to access an existing key would be better.
It might be "obvious" to you because you are used to think like that.
However, in many languages (such as PHP) it's obvious that you are
building a relational map by indexing it. That is, when you say:

myMap[key] = value;

First off, my criticism of that is that there's already a way to
express that intent in C++ - the insert method. When you already have
a way to express that intent, why make the intent of another notation
less clear (and reduce the opportunities for error checking to catch
problems) by mixing the two intents together.

insert cannot be used to access an existing key. Why allow operator[]
the dual role of both accessing existing keys and inserting new ones?

I always prefer a different notation for a different intent (within
reason) so that errors are caught instead of remaining hidden and
leading to garbage results being trusted.

It's subjective opinion, of course, esp. in where you draw the "within
reason" line, but there's a lot more to it than just "what I'm used
to".

Secondly, it's not something I simply got used through using some
other language at all, unless it was an old pre-standard version of
C++. I can't even name a language that uses the semantics I suggest.
Ada, Modula 2, Pascal, C, Basic and Assembler don't have standard
associative containers to the best of my memory. Of the languages I've
played with, Haskell is pure functional - no mutables (except for the
monad getout clause), Prolog is wierd, and I'm dimly aware that arrays
are mutable in Objective Caml but that's my limit.

My own containers don't allow subscripting to insert new items, but
that was either based on a decision at the time to go against what I
was used to, or else it was doing what I was used to in pre-standard
C++. I honestly don't recall which.

I last used map (there was no std:: prefix at the time) in Borland C++
5, so maybe someone remembers what that did - I haven't had access to
a copy for years now.
std::map has that exact same idea (although it's not as flexible as
PHP because the type of the key and the value are fixed). As exemplified
by Erik, this can be quite handy in some cases, for example because you
can write things like:

words[word]++;

One single command adds a new 'word' to the map and increments its value.

You won't get the result you expect, or at least if you do it's only a
fluke.

Try the following...

#include <map>
#include <iostream>

void Garbage ()
{
char garbage [1024];
for (int i = 0; i < 1024; ++i) garbage = 0x55;
}

void Test ()
{
int a;

std::cout << a << std::endl;
}

int main()
{
Garbage ();
Test ();

return 0;
};

What result do you expect from Test? If you say zero, you're wrong.
The default constructor for int is trivial - it does nothing. The
variable a is never initialised, so it's value is junk. Because of the
specific junk that happens to be on the stack at the time (due to the
prior Garbage call) the most likely output is 2009147472.

In a completely trivial test program you will fluke it - on typical
desktop systems - as the memory is normally wiped to all zero as your
program is loaded. As soon as the program does anything significant,
though...

Getting back to your example, if the key "word" is new, so you insert
a new item, you are incrementing a garbage value - whatever happened
to be in some arbitrary chunk of memory prior to executing that line.
Even that behaviour is only in practice - the standard will say that
the behaviour is undefined.

That error wouldn't happen in PHP, I'll bet, but C++ isn't a scripting
language and it works differently, which was a significant part of my
point. I only disagree with it to a small degree in scripting
languages, but there are more problems with it in C++.

Your compiler might warn you that you're using an uninitialised value,
but only in some cases. In GCC, the example I gave doesn't seem to
give a warning even with -Wall. In your example, the compiler cannot
give a warning - it cannot know that the reference returned by
operator[] refers to an uninitialised location.
If you want to check if a key exists in the map, that's what the
find() member function is for.

But when the needed checks aren't done by default, they often don't
get done at all. Relative to C, the trend in C++ has been toward
language features that are safer by default.

For example, we no longer have to program an explicit null check after
using new. Pre-standardisation, new was safer than malloc (it
supported initialisation by default) but could still return null if it
couldn't allocate memory. Now, we also have an exception by default.
We don't have to remember the check - it is implicit.

Notations that cleanly separate different intents support better
automatic checks - it's that simple.
 
T

Thomas J. Gritzan

Stephen said:
Stephen said:
I realise that scripting languages do it, but they get to handle the
[] differently depending on whether its on the left side of an
assignment or not. They still complain if you try to read a
non-existent key.
Actually in many languages (such as for example PHP) relational maps
work exactly that way. That is, you add an element to the map by
"indexing" it with the key and assigning the element. This is a very
common idiom eg. in PHP.

In Ruby it's basically the same with a dictionary. You can overload
index-get and index-assignment separatly, that is, [] and []= are two
different functions/methods. With []= you insert a new element or change
it, with [] you get nil, if the element doesn't exist.

Stephen said:
C++ allows key:data insertion using "container [key] = data;". That
surprised me. I realise that scripting languages *also* do "container
[key] = data;", but they get to handle the subscripting differently.

Basically, scripting languages don't do a two stage process of insert
a key:default in the [], then overwrite the default with the data in
the assign. They merge the subscripting and assign into a single step.

Yes. In Ruby, the method [] is called if you want to get a value, and
[]= if you assign to it.

In C++ there's only one operator[] and it is called for read and write.
You can fake the semantics by returning a proxy object that has
operator=(const TYPE&) for assignment and operator TYPE() for reading
the value.

Stephen said:
Juha said:
std::map has that exact same idea (although it's not as flexible as
PHP because the type of the key and the value are fixed). As exemplified
by Erik, this can be quite handy in some cases, for example because you
can write things like:

words[word]++;

One single command adds a new 'word' to the map and increments its value.

You won't get the result you expect, or at least if you do it's only a
fluke.

He will.
Try the following...

#include <map>
#include <iostream>

void Garbage ()
{
char garbage [1024];
for (int i = 0; i < 1024; ++i) garbage = 0x55;
}


That's a completly different case. 'garbage' here is not initialized.
But std::map guarantees that new objects are default-initialized.

Is the same as with std::vector:

std::vector<int> v;
v.resize(10);

The elements don't contain garbage, they are all 0 (zero).
What result do you expect from Test? If you say zero, you're wrong.
The default constructor for int is trivial - it does nothing. The
variable a is never initialised, so it's value is junk. Because of the
specific junk that happens to be on the stack at the time (due to the
prior Garbage call) the most likely output is 2009147472.

In a completely trivial test program you will fluke it - on typical
desktop systems - as the memory is normally wiped to all zero as your
program is loaded. As soon as the program does anything significant,
though...

Getting back to your example, if the key "word" is new, so you insert
a new item, you are incrementing a garbage value - whatever happened
to be in some arbitrary chunk of memory prior to executing that line.
Even that behaviour is only in practice - the standard will say that
the behaviour is undefined.

In general it is true that primitives aren't initialized, but the
standard containers guarantee that they are.

You should read a good book to update your C++.
 
J

James Kanze

Stephen said:
I realise that scripting languages do it, but they get to
handle the [] differently depending on whether its on the
left side of an assignment or not. They still complain if
you try to read a non-existent key.
Actually in many languages (such as for example PHP)
relational maps work exactly that way. That is, you add an
element to the map by "indexing" it with the key and assigning
the element. This is a very common idiom eg. in PHP.

Such languages don't have const. All of the ones I know, for
that matter, are untyped, and don't require variable
declarations either. So there is always a well defined value
which can be used for initialization.

In C++, logically, you should be able to use [] on a const map
(if you're just reading). And you should be able to use [] even
if the mapped type doesn't have a default constructor.

On the other hand, one can easily argue that operator[] isn't
part the basic interface defined by std::map. It's just an
added convenience (or even an example of operator overload
abuse:)). Most of the time, you'll wrap std::map in a class
which provides a more convenient interface for what it's being
used for, which is application specific. The presence of
operator[] is just as a convenience, for one common use, and
this particular use was chosen precisely because it happens to
be supported in a couple of other common languages (AWK, perl,
etc.)
And even with the [] on the left of an assignment, I still
think it's a bit bad, since to me the obvious intent is to
overwrite the data for an existing key.
It might be "obvious" to you because you are used to think
like that.

I think the point of something like std::map is that it is
reallly a low level building block, with a number of (sometimes
contradicting) "obvious" uses. In this sense, operator[]
doesn't really belong, so who cares what semantics it has. (In
practice said:
However, in many languages (such as PHP) it's obvious that you
are building a relational map by indexing it. That is, when
you say:
    myMap[key] = value;

Except that that's not at all the way you build a map in C++
(and it's a very bad way of building it in other languages). In
C++, the idiomatic fashion would be something like:

while ( more elements ) {
if ( ! myMap.insert(
MapType::value_type(
element.key, element.value ) ).second ) {
// error handling...
}
}

Correctly used, find, insert and erase allow you to implement
whatever idiomatic use you need in a particular case.
when you are doing is adding 'value' at the "position" 'key'
of the map. Or if there was already such a key, you are
replacing its data with the new value.

In other words, you might be overwriting an existing value when
you don't want to.
std::map has that exact same idea (although it's not as
flexible as PHP because the type of the key and the value are
fixed). As exemplified by Erik, this can be quite handy in
some cases,

In a very few cases, yes.

C++ and PHP have very different goals. In C++, you're supposed
to be able to write robust code. This means taking advantage of
the stricter type system, const, etc. And it generally means
more thorough error checking---in most data base use, insertion
is a distinct action from update, and when you want one,
accidentally getting the other is an error.
for example because you can write things like:
    words[word]++;
One single command adds a new 'word' to the map and increments
its value.
If you want to check if a key exists in the map, that's what the
find() member function is for.

And if you want to access in a way that distinguishes between
update and insert, you wrap std::map in a class that uses find
(and insert). If you want an assertion failure, or an
exception, or a returned error code for a failed update, you can
then have it.

The problem with operator[] isn't really what it does. The
problem is that it is even there, since there is no one
universally useful semantics for it.
 
J

James Kanze

Stephen said:
I realise that scripting languages do it, but they get to
handle the [] differently depending on whether its on the
left side of an assignment or not. They still complain if
you try to read a non-existent key.
Actually in many languages (such as for example PHP)
relational maps work exactly that way. That is, you add an
element to the map by "indexing" it with the key and
assigning the element. This is a very common idiom eg. in
PHP.

Just a few general comments, since I basically agree with your
main point.
I didn't know that, and on my first read of this I thought I'd
overgeneralised a principle from basically just Python to
scripting languages more generally. On second read, you are
saying precisely what I said - PHP is a scripting language,
and it is doing what I said is characteristic of scripting
languages.

I don't think that the difference is some much scripting or not,
but rather the fact that the languages in question have a
radically different type system than C++: either everything is
the same type, or (e.g. lisp) all typing is dynamic, and there
is a null type of some sort. Typically, in such cases, you
don't have to declare a variable; you just use it, and if it
doesn't already exist, it is created with a default value (empty
string in AWK, probably the null value in lisp, etc.).
Maybe I worded my post awkwardly. To clarify...
C++ allows key:data insertion using "container [key] = data;".
That surprised me. I realise that scripting languages *also*
do "container [key] = data;", but they get to handle the
subscripting differently.
Basically, scripting languages don't do a two stage process of
insert a key:default in the [], then overwrite the default
with the data in the assign.

I suspect that most do, although many implementations are
possible.
They merge the subscripting and assign into a single step.
For example, Python has separate magic methods for overriding
a subscripted read and a subscripted write. When C++ handles
the subscripting operation it doesn't even know that there is
an assign.

It could. Operator[] could return a proxy.
For example...
  std::map<int,int>  mymap;
  int  var;
  var  = mymap [1];
Did the programmer really intend for var to get a random junk
value that was also assigned into mymap in a key=1:data=junk
pair?

It's not random junk. It's mapped_type(). The "default
initialized" object. In the case of an int, it's 0.
Python would know better since the assignment is on the wrong
side of the assignment, I assume PHP would as well, but C++
cannot since when it handles the operator[], the method
doesn't know how the returned reference will be used. It
returns a reference that might be used for a read or a write,
or which may not be used at all.

Again, it's not at all certain that other languages use this
information, any more than C++ does. And you can do it in C++
as well by means of a proxy. The difference is more closely
related to the stricter type system, and the fact that you
always have to declare variables before use. In C++, the
semantics of operator[] in std::map are inconsistent with the
way the basic language works. C++ is designed for robustness;
languages like Python or PHP for rapid development (of less
critical applications).
The two normal variants of operator[] aren't read and write -
they are const and non-const as follows...
  const data_t& container::eek:perator[] const  { ... }
        data_t& container::eek:perator[]        { ... }
The first shouldn't modify the container, and returns a
reference that will normally prevent the referenced value from
being modified (with various get-out clauses - mutable,
const_cast, ...). This version is called whenever the
container is considered const.

Independently of all that: any "normal" operator[] should be
callable on const objects. And the non-const version shouldn't
have different semantics from the const one.

There's also the point that if you overload an operator, it
should behave in some way like the built-in
operator---otherwise, it's operator overload abuse. The
built-in operator[] never adds elements to an array.

[...]
It might be "obvious" to you because you are used to think
like that. However, in many languages (such as PHP) it's
obvious that you are building a relational map by indexing
it. That is, when you say:
   myMap[key] = value;
First off, my criticism of that is that there's already a way
to express that intent in C++ - the insert method.

The semantics aren't the same. std::map<>::insert corresponds
to an insertion. It will never overwrite an existing value.
And it has a return code which indicates whether the insertion
took place or not.
When you already have a way to express that intent, why make
the intent of another notation less clear (and reduce the
opportunities for error checking to catch problems) by mixing
the two intents together.
insert cannot be used to access an existing key. Why allow
operator[] the dual role of both accessing existing keys and
inserting new ones?

Because in some contexts, that's what you want?

[...]
Secondly, it's not something I simply got used through using
some other language at all, unless it was an old pre-standard
version of C++. I can't even name a language that uses the
semantics I suggest. Ada, Modula 2, Pascal, C, Basic and
Assembler don't have standard associative containers to the
best of my memory.

They all do have a built-in operator[], however, which never
inserts. With the exception of Basic and Assembler, they all
have string typing, and require variables to be declared.

[...]
std::map has that exact same idea (although it's not as
flexible as PHP because the type of the key and the value are
fixed). As exemplified by Erik, this can be quite handy in
some cases, for example because you can write things like:
   words[word]++;
 One single command adds a new 'word' to the map and
increments its value.
You won't get the result you expect, or at least if you do
it's only a fluke.

Yes you will.
Try the following...
#include <map>
#include <iostream>
void Garbage ()
{
  char garbage [1024];
  for (int i = 0; i < 1024; ++i)  garbage = 0x55;
}

void Test ()
{
  int a;
  std::cout << a << std::endl;
}
int main()
{
  Garbage ();
  Test ();
  return 0;
};

What does that have to do with it? I don't see any std::map,
and I don't see any use of int which ressembles that of
std::map. The equivalent would be something as if Test did:

std::cout << int() << std::endl ;

And that's guaranteed to output 0.
What result do you expect from Test? If you say zero, you're
wrong. The default constructor for int is trivial - it does
nothing.

From the C++ standard: "To default-initialize an object of type
T means: [...] -- otherwise [POD, not an array], the storage for
the object is zero initialized." And "An object whose
initializer is an empty set of parentheses, i.e. (), shall be
default-initialized." And finally, the specification of T&
operator[]( key_type const& x ) in map: "Returns
(*((insert(make_pair(x, T()))).first)).second."

Note the T() in that last expression.
 
S

Stephen Horne

In general it is true that primitives aren't initialized, but the
standard containers guarantee that they are.

You should read a good book to update your C++.

I already have Stroustrup Special Edition, and I note (page 483) that
you appear to be right. There's a note re: an example "This code takes
advantage of the fact that a new element gets its int value set to 0
by default.".

So I'm mistaken, but it's wierd. The *default* *constructor* for int
does no such thing, so this is just a library doing special case magic
behind the scenes - presumably some template specialisation for
std::pair.

I still feel that the C++ library is acting wrong here. If basic types
such as integers should be initialised to zero on construction, they
should *always* be initialised to zero on construction.

Being initialised as opposed to uninitialised is never dangerous, of
course - until programmers start relying on it, and then don't
understand why their code breaks when they expect initialisation in
some other context.

Programmers shouldn't be expected to memorise endless special cases
and, if you write code that depends on that magic, don't be surprised
if some experienced programmer - one who hasn't memorised the entire
C++ standard word for word, or who doesn't trust every library
implementation to do every bit of magic correctly - changes it.
 
K

Kai-Uwe Bux

Stephen said:
I already have Stroustrup Special Edition, and I note (page 483) that
you appear to be right. There's a note re: an example "This code takes
advantage of the fact that a new element gets its int value set to 0
by default.".

So I'm mistaken, but it's wierd. The *default* *constructor* for int
does no such thing, so this is just a library doing special case magic
behind the scenes - presumably some template specialisation for
std::pair.

No, that's not it. The standard just has the provison:

[23.3.1.2/1]
T& operator[](const key_type& x);
Returns: (*((insert(make_pair(x, T()))).first)).second.

Note the T() buried deep inside. For built-in types like int, this is int().
Now, that expression evaluates to 0 by:

[5.2.3/2]:
The expression T(), where T is a simple-type-specifier (7.1.5.2) for a
non-array complete object type or the (possibly cv-qualified) void type,
creates an rvalue of the specified type, which is value-initialized
(8.5; no initialization is done for the void() case).

In particular, the library does not create weird exceptions by specializing
randomly.


[snip]


Best

Kai-Uwe Bux
 
S

Stephen Horne

I nervously remove you from my filter and see you replying to me...

Hope I can read this and remain sane ;-)


On Oct 18, 1:59 am, Stephen Horne <[email protected]> wrote:
I don't think that the difference is some much scripting or not,

Agreed - in fact "scripting" doesn't really mean much. It's just a
generalisation, and I'm not even confident that the generalisation
holds for all/most scripting languages.
but rather the fact that the languages in question have a
radically different type system than C++: either everything is
the same type, or (e.g. lisp) all typing is dynamic, and there
is a null type of some sort. Typically, in such cases, you
don't have to declare a variable; you just use it, and if it
doesn't already exist, it is created with a default value (empty
string in AWK, probably the null value in lisp, etc.).

I *think* I get the point - that you shouldn't magically create an
item in a container if you wouldn't magically create a plain variable?

I think I agree.
Basically, scripting languages don't do a two stage process of
insert a key:default in the [], then overwrite the default
with the data in the assign.

I suspect that most do, although many implementations are
possible.

I'm confident that Python (in effect - and probably in practice)
transforms the AST during bytecode compilation. From the described
semantics of PHP, it sounds like it does the same. Beyond that, I'm
possibly overgeneralising, but I still think the AST transform will be
more common than proxies.

The reason is to support overriding of subscripting for user
containers. Separate override methods for read and write make a lot of
sense. Returning a proxy, OTOH, would be a major pain in a scripting
language given the type system (you're trying to put the reference
into the container for a write - not to copy the object over some
previous object while leaving the reference intact) and the resulting
efficiency issues.
They merge the subscripting and assign into a single step.
For example, Python has separate magic methods for overriding
a subscripted read and a subscripted write. When C++ handles
the subscripting operation it doesn't even know that there is
an assign.

It could. Operator[] could return a proxy.

Yes, but even then operator[] doesn't know what will happen to the
proxy - it's still a two step process. In fact, the reference that it
does return *is* a (trivial) proxy.
It's not random junk. It's mapped_type(). The "default
initialized" object. In the case of an int, it's 0.

OK - my mistake. I expressed my opinion on this elsewhere, and I don't
intend to get into another flame war over it.
C++ is designed for robustness;

That principle took second place to efficiency concerns or following a
C related tradition more than once, but in general, yes.
The first shouldn't modify the container, and returns a
reference that will normally prevent the referenced value from
being modified (with various get-out clauses - mutable,
const_cast, ...). This version is called whenever the
container is considered const.

Independently of all that: any "normal" operator[] should be
callable on const objects.

I don't think I understand the point here.

A "normal" (ie non-const) operator[] cannot be callable for const
objects because it is non-const, because it needs to return a
non-const reference, and so on.

If you're simply saying that...
And the non-const version shouldn't
have different semantics from the const one.

.... the two versions should be as similar as possible, with the
constness of the returned reference as the only difference, I think I
agree.
There's also the point that if you overload an operator, it
should behave in some way like the built-in
operator---otherwise, it's operator overload abuse. The
built-in operator[] never adds elements to an array.

Yes, but the enemy ;-) can site precedent on this one.

In Python, subscripting a dictionary can insert a new key, but
subscripting a list will not insert a new index - not even if you
subscript to the .length () position. There's a history of treating
associative different to sequence, key subscripting different to
position subscripting.
The semantics aren't the same. std::map<>::insert corresponds
to an insertion. It will never overwrite an existing value.
And it has a return code which indicates whether the insertion
took place or not.

IOW there's a third reasonable intent, which still shouldn't be
confounded with read. My containers call it "write", though it might
equally be called "set".
When you already have a way to express that intent, why make
the intent of another notation less clear (and reduce the
opportunities for error checking to catch problems) by mixing
the two intents together.
insert cannot be used to access an existing key. Why allow
operator[] the dual role of both accessing existing keys and
inserting new ones?

Because in some contexts, that's what you want?

In those relatively unusual cases (compared with always-reads,
always-replaces and always-inserts), you should express that
particular intent. It's more robust that way.

Again, subject to just how far you want to push the principle.
What does that have to do with it? I don't see any std::map,
and I don't see any use of int which ressembles that of
std::map. The equivalent would be something as if Test did:

I just followed the principle that default construction does the same
thing everywhere. Turns out there's this separate concept of default
initialisation.

IOW I was wrong.
 
S

Stephen Horne

In C++, logically, you should be able to use [] on a const map
(if you're just reading).

I still don't get this. It doesn't fit with the normal pattern of
expression evaluation for C++. operator[] doesn't know whether you're
reading or not - in fact you're *always* reading in a sense - just
reading a reference rather than a value.

To return a non-const reference you simply need a non-const operator[]
method. That's not a library rule - it's built into the language
itself.

To do differently, C++ would have to overload based on return types,
so it can know whether the returned reference needs to be const or not
when it handles the operator[]. Ada can do this, but C++ never has.
And you should be able to use [] even
if the mapped type doesn't have a default constructor.

This is a very good point. I have plenty of types that don't support
default constructors because there is no obvious default value, and
because I never saw a need to default construct them.

Doesn't mean I never need to use them as keys.
Most of the time, you'll wrap std::map in a class
which provides a more convenient interface for what it's being
used for, which is application specific.

I disagree, though I'm not sure if I disagree enough to contradict
you, if you see what I mean.

I often write classes that have several associative containers as
members. I don't count this as "wrapping std::map" - I'm simply using
it within a class, the same as any other type.

The most "wrapping" I do for most container types is to use a typedef.

Overdoing limited interface wrappers adds an overhead to development.
It's a bit like overdoing the layers of inheritance. It means you're
always having to refactor everything before you can get the relative
information where it's needed to handle new requirements.

I'm not saying that specialising a standard class is wrong - just that
it's overkill for a lot of common uses of a container. If I use, for
example, std::map<thisclass,thatclass> in the implementation of class
subsys, I'm probably not going to need the same mapping anywhere else.
Why add another layer of abstraction between the subsys class and the
std::map?

That said, the scope in which that container is used is always pretty
limited. The word "wrapper" is taking it too far, IMO, but your point
is probably valid.
I think the point of something like std::map is that it is
reallly a low level building block

This to me is also a little surprising - or at least it depends on
what you mean by low level, and what you expect from low level tools.

To me, a low level binary tree tool should provide data-structure
handling algorithms, but shouldn't act as a container at all. If I
feel like rearranging my binary tree into a linked list simply by
modifying all the link fields and then remembering to treat it
differently, that should be my right. If I feel like partitioning all
the nodes into several classes, and building each of those node sets
into a separate tree, that should also be my right. I should have
direct access to all the internals, so I can do with them as I please.
With a low level library, the rights and the responsibilities should
be mine, with no unnecessary barriers in the way.

To me, a container should be a high level wrapper around a low level
data structure. Two different libraries appropriate to two different
levels of abstraction.

Trouble is, I guess, that high vs. low level isn't a simple binary
choice but rather a matter of degree.
 
J

James Kanze

@gmail.com> wrote:

[...]
I *think* I get the point - that you shouldn't magically
create an item in a container if you wouldn't magically create
a plain variable?

That's more or less it. In AWK, something like
a = 43
or even
x = a
is legal, even if the variable a has never been been mentionned
before. So it makes perfect sense, and fits perfectly into the
language, to be able to do the same thing with something like
"a". C++, however, is a different language, and what fits
perfectly in AWK (or perl, or PHP, or whatever) doesn't
necessarily fit into C++.
Basically, scripting languages don't do a two stage process
of insert a key:default in the [], then overwrite the
default with the data in the assign.
I suspect that most do, although many implementations are
possible.
I'm confident that Python (in effect - and probably in
practice) transforms the AST during bytecode compilation. From
the described semantics of PHP, it sounds like it does the
same. Beyond that, I'm possibly overgeneralising, but I still
think the AST transform will be more common than proxies.

How they implement it is more or less irrelevant. The point is
that (at least in AWK or perl---I don't know the others) because
of the point mentionned above, that you can use a variable which
hasn't been defined in just about any context, you must have a
perfectly acceptable and well defined "default" value to use.
About the only time it might make a difference is if you know
you'll never actually "insert" the default value, use a test for
this default value to determine that the element isn't in the
map, and have a lot of misses: the map will occupy a lot more
memory than necessary.
The reason is to support overriding of subscripting for user
containers. Separate override methods for read and write make
a lot of sense.

Certainly, in certain contexts. Given the overall expression
syntax of C++, I don't think it could reasonably be made to
work, but it's certainly convenient in languages where it can be
made to work.
Returning a proxy, OTOH, would be a major pain in a scripting
language given the type system (you're trying to put the
reference into the container for a write - not to copy the
object over some previous object while leaving the reference
intact) and the resulting efficiency issues.

Returning a proxy in C++ isn't a perfect solution; you can't
really handle cases like a[ i ].x (either lvalue or rvalue use)
very well, and it requires a lot of boilerplate to handle things
like a[ i ] *= 3. But it's "good enough" most of the time.
They merge the subscripting and assign into a single step.
For example, Python has separate magic methods for
overriding a subscripted read and a subscripted write. When
C++ handles the subscripting operation it doesn't even know
that there is an assign.
It could.  Operator[] could return a proxy.
Yes, but even then operator[] doesn't know what will happen to
the proxy - it's still a two step process. In fact, the
reference that it does return *is* a (trivial) proxy.

Yes:). What the classical proxy does is allow "trapping"
specific lvalue uses, and handling them specially. What we'd
really like is for the proxy to act as a smart reference, with a
user defined conversion for lvalue to rvalue use. Since we
can't overload operator.(), we can't do this, and even with
operator.(), it would require a lot of boilerplating (which most
proxies don't do) to cover *all* lvalue uses. (Some lvalue
uses, of course, like unary & or binding to a reference, you
usually don't want to support, since doing so would mean that
you do lose control of all of the references to the object.)

[...]
That principle took second place to efficiency concerns or
following a C related tradition more than once, but in
general, yes.

Yes. C++ represents a lot of compromizes.
The first shouldn't modify the container, and returns a
reference that will normally prevent the referenced value from
being modified (with various get-out clauses - mutable,
const_cast, ...). This version is called whenever the
container is considered const.
Independently of all that: any "normal" operator[] should be
callable on const objects.
I don't think I understand the point here.
A "normal" (ie non-const) operator[] cannot be callable for
const objects because it is non-const, because it needs to
return a non-const reference, and so on.
If you're simply saying that...
... the two versions should be as similar as possible, with
the constness of the returned reference as the only
difference, I think I agree.

What I meant is that if you supply a user operator[], you should
ensure (somehow) that it is usable on a const object, since the
built-in operator[] is usable on const objects. The usual means
would be to supply two operator[], one const, and one not. In
which case, they should have the same semantics---having one
which e.g. raises an exception if the indexed element isn't
there, and the other which inserts it, just doesn't go.
There's also the point that if you overload an operator, it
should behave in some way like the built-in
operator---otherwise, it's operator overload abuse.  The
built-in operator[] never adds elements to an array.
Yes, but the enemy ;-) can site precedent on this one.

There's no "enemy". There are legitimate arguments for both
sides. In this case, I happen to think that the arguments
against operator[] in std::map, at least with its current
semantics, are stronger. But I'll still point out arguments the
other way if I see them. IMHO, it's not a killer, because
std::map doesn't depend on operator[] for its usability. If you
don't like the semantics of its operator[] (and I don't, except
in special cases), then just don't use it (which I don't, except
in special cases).
In Python, subscripting a dictionary can insert a new key, but
subscripting a list will not insert a new index - not even if
you subscript to the .length () position. There's a history of
treating associative different to sequence, key subscripting
different to position subscripting.

In Python, every type has a default constructor (I think), and
there is no const (I think). Which makes the situation
different enough from C++ that the analogy doesn't really hold.
IOW there's a third reasonable intent, which still shouldn't
be confounded with read. My containers call it "write", though
it might equally be called "set".

I think that the real problem is that at an application level,
things like std::map are used to implement various idioms. In
some, like data bases, distinguishing insertion from update is
fundamental. (They're two different verbs in SQL.) In others,
it's not, and in a few, not distinguishing them at all is very
convenient.
When you already have a way to express that intent, why
make the intent of another notation less clear (and reduce
the opportunities for error checking to catch problems) by
mixing the two intents together.
insert cannot be used to access an existing key. Why allow
operator[] the dual role of both accessing existing keys
and inserting new ones?
Because in some contexts, that's what you want?
In those relatively unusual cases (compared with always-reads,
always-replaces and always-inserts), you should express that
particular intent. It's more robust that way.

I more or less agree. In general, if it weren't possible to
make the distinction, std::map would be more or less unusable in
a lot (probably most) cases. But that's not really the case.
As I said, just wrap std::map in an application specific class
(which is a good idea regardless), and define operator[] of that
class to provide the appropriate semantics for its particular
use, using find, insert and erase. The "good" thing about
operator[] is that you never have to use it, and that you rarely
even are tempted to use it.
 
J

James Kanze

In C++, logically, you should be able to use [] on a const
map (if you're just reading).
I still don't get this. It doesn't fit with the normal pattern
of expression evaluation for C++. operator[] doesn't know
whether you're reading or not - in fact you're *always*
reading in a sense - just reading a reference rather than a
value.

If you're providing user defined operators, it's up to you to
define them in a "logical" fashion---logical, in this case,
being largely determined by what the built-in operators do.
Logically, if a class supports [], you should be able to use it
on a const object, or the designer of the class has abused
operator overloading. (The language doesn't prevent you from
designing a class in which [] meant the same as /=. But I
wouldn't call that "logical".)

In sum, my "logical" applies at the design level: is this a
logical design for a class.

[...]
And you should be able to use [] even if the mapped type
doesn't have a default constructor.
This is a very good point. I have plenty of types that don't
support default constructors because there is no obvious
default value, and because I never saw a need to default
construct them.

You too:). I recently modified my Fallible class to support
them, because they are so common.
Doesn't mean I never need to use them as keys.

Keys aren't a problem. Mapped to types are (but only if you
actually use operator[]).
I disagree, though I'm not sure if I disagree enough to
contradict you, if you see what I mean.

Well, for starters: anything fundamental to the application
should be in an application specific class, with as narrow an
interface as posssible. This way, if you later have to change
it, you can. In the case of std::map, the "basic" interface
involves find, insert and erase, and is not really very usable
(or at least very convenient) to use directly. Which leads to
wrapping it into something convenient (and with the desired
semantics) for what is needed in the application.
I often write classes that have several associative containers
as members. I don't count this as "wrapping std::map" - I'm
simply using it within a class, the same as any other type.

OK. My point was just that all use should be well hidden from
the more general code of the application. As an extreme case, I
certainly don't think you'd need a special wrapper class for a
single std::map when implementing a bidirectional map.
The most "wrapping" I do for most container types is to use a
typedef.

It depends. At the lowest level, perhaps, but application code
never really sees any standard container; any use of one is well
hidden in a "wrapper". Of course, you might not consider it
"just a wrapper", because it often does a lot more, ensuring
persistency, or transactional integrity, for example.
Overdoing limited interface wrappers adds an overhead to
development. It's a bit like overdoing the layers of
inheritance. It means you're always having to refactor
everything before you can get the relative information where
it's needed to handle new requirements.

Obviously, you can overdo anything. But carefully isolating
things, and only providing narrow interfaces, pays off in the
long run.
I'm not saying that specialising a standard class is wrong -
just that it's overkill for a lot of common uses of a
container. If I use, for example,
std::map<thisclass,thatclass> in the implementation of class
subsys, I'm probably not going to need the same mapping
anywhere else. Why add another layer of abstraction between
the subsys class and the std::map?

It depends somewhat on what subsys is doing, but if you can't
consider subsys a wrapper, maybe it's doing too much.
That said, the scope in which that container is used is always
pretty limited. The word "wrapper" is taking it too far, IMO,
but your point is probably valid.

Yes. The point isn't meant to be an absolute. But good
decoupling is important, and that includes decoupling higher
level abstractions from the interfaces defined in the standard
library.

My classical example for this is that you shouldn't use
std::string (nor std::vector<char>) directly as your text buffer
in an editor. You should define a TextBuffer class, with
exactly the interface you need for your purposes. You *should*
use std::string to implement this class---in many ways, it
really won't be more than a wrapper. But the day you find that
in fact, for performance reasons, you have to change the
implementation, the use of std::string is well isolated, and you
shouldn't have to modify the application from a to z.
This to me is also a little surprising - or at least it
depends on what you mean by low level, and what you expect
from low level tools.
To me, a low level binary tree tool should provide
data-structure handling algorithms, but shouldn't act as a
container at all. If I feel like rearranging my binary tree
into a linked list simply by modifying all the link fields and
then remembering to treat it differently, that should be my
right. If I feel like partitioning all the nodes into several
classes, and building each of those node sets into a separate
tree, that should also be my right. I should have direct
access to all the internals, so I can do with them as I
please. With a low level library, the rights and the
responsibilities should be mine, with no unnecessary barriers
in the way.
To me, a container should be a high level wrapper around a low
level data structure. Two different libraries appropriate to
two different levels of abstraction.
Trouble is, I guess, that high vs. low level isn't a simple binary
choice but rather a matter of degree.

In the end, yes. All of the implementations of std::map I've
seen use an even lower level class which implements the binary
tree (and is also used in e.g. std::set). On the other hand, a
data dictionary or a keyed access method to a data base is *not*
a container, and you probably don't want to support all of the
classical container interfaces to it. And most applications
need a data dictionary, or a keyed access method, and not a
generic std::map.
 
S

Stefan Ram

James Kanze said:
If you're providing user defined operators, it's up to you to
define them in a "logical" fashion---logical, in this case,
being largely determined by what the built-in operators do.

I have - after several decades of guess-work - eventually
found that »logical« often is /not/ intended to mean »with
regard to logic«, but »rule-like« (»regular«, »systematic«),
more precisely »ruled by a /small/ set of /simple/ rules«.

So, one might use this wording:

»If you're providing user defined operators, it's up to
you to define them in a rule-like fashion---that is, being
ruled by the same rules that determine what the built-in
operators do.«

If we use the same rules, we do not extend the set of all
rules, and therefore all operators will be defined according
to set of rules that is as small as possible.

See also:

http://scienceblogs.com/goodmath/2007/01/basics_logic_aka_its_illogical_1.php


.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,810
Latest member
Kassie0918

Latest Threads

Top