Stephen said:
I realise that scripting languages do it, but they get to handle the
[] differently depending on whether its on the left side of an
assignment or not. They still complain if you try to read a
non-existent key.
Actually in many languages (such as for example PHP) relational maps
work exactly that way. That is, you add an element to the map by
"indexing" it with the key and assigning the element. This is a very
common idiom eg. in PHP.
I didn't know that, and on my first read of this I thought I'd
overgeneralised a principle from basically just Python to scripting
languages more generally. On second read, you are saying precisely
what I said - PHP is a scripting language, and it is doing what I said
is characteristic of scripting languages.
Maybe I worded my post awkwardly. To clarify...
C++ allows key:data insertion using "container [key] = data;". That
surprised me. I realise that scripting languages *also* do "container
[key] = data;", but they get to handle the subscripting differently.
Basically, scripting languages don't do a two stage process of insert
a key:default in the [], then overwrite the default with the data in
the assign. They merge the subscripting and assign into a single step.
For example, Python has separate magic methods for overriding a
subscripted read and a subscripted write. When C++ handles the
subscripting operation it doesn't even know that there is an assign.
For example...
std::map<int,int> mymap;
int var;
var = mymap [1];
Did the programmer really intend for var to get a random junk value
that was also assigned into mymap in a key=1:data=junk pair? Python
would know better since the assignment is on the wrong side of the
assignment, I assume PHP would as well, but C++ cannot since when it
handles the operator[], the method doesn't know how the returned
reference will be used. It returns a reference that might be used for
a read or a write, or which may not be used at all.
The two normal variants of operator[] aren't read and write - they are
const and non-const as follows...
const data_t& container:
perator[] const { ... }
data_t& container:
perator[] { ... }
The first shouldn't modify the container, and returns a reference that
will normally prevent the referenced value from being modified (with
various get-out clauses - mutable, const_cast, ...). This version is
called whenever the container is considered const.
The second returns a non-const reference which is equally valid for
both reading or writing the referenced data. This version is called
whenever the container is considered non-const, which is no clue as to
whether a read or write is intended.
The semantics of the first case ensure a const container is safe - you
can't modify a const container so you can't insert a key:default pair
and I assume an exception is thrown in that case.
I should say that based purely on the method signature there is no
reason to ban inserts on subscripting - the container is non-const.
Even so, I feel that reserving the subscripting operator for cases
where the intent is to access an existing key would be better.
It might be "obvious" to you because you are used to think like that.
However, in many languages (such as PHP) it's obvious that you are
building a relational map by indexing it. That is, when you say:
myMap[key] = value;
First off, my criticism of that is that there's already a way to
express that intent in C++ - the insert method. When you already have
a way to express that intent, why make the intent of another notation
less clear (and reduce the opportunities for error checking to catch
problems) by mixing the two intents together.
insert cannot be used to access an existing key. Why allow operator[]
the dual role of both accessing existing keys and inserting new ones?
I always prefer a different notation for a different intent (within
reason) so that errors are caught instead of remaining hidden and
leading to garbage results being trusted.
It's subjective opinion, of course, esp. in where you draw the "within
reason" line, but there's a lot more to it than just "what I'm used
to".
Secondly, it's not something I simply got used through using some
other language at all, unless it was an old pre-standard version of
C++. I can't even name a language that uses the semantics I suggest.
Ada, Modula 2, Pascal, C, Basic and Assembler don't have standard
associative containers to the best of my memory. Of the languages I've
played with, Haskell is pure functional - no mutables (except for the
monad getout clause), Prolog is wierd, and I'm dimly aware that arrays
are mutable in Objective Caml but that's my limit.
My own containers don't allow subscripting to insert new items, but
that was either based on a decision at the time to go against what I
was used to, or else it was doing what I was used to in pre-standard
C++. I honestly don't recall which.
I last used map (there was no std:: prefix at the time) in Borland C++
5, so maybe someone remembers what that did - I haven't had access to
a copy for years now.
std::map has that exact same idea (although it's not as flexible as
PHP because the type of the key and the value are fixed). As exemplified
by Erik, this can be quite handy in some cases, for example because you
can write things like:
words[word]++;
One single command adds a new 'word' to the map and increments its value.
You won't get the result you expect, or at least if you do it's only a
fluke.
Try the following...
#include <map>
#include <iostream>
void Garbage ()
{
char garbage [1024];
for (int i = 0; i < 1024; ++i) garbage
= 0x55;
}
void Test ()
{
int a;
std::cout << a << std::endl;
}
int main()
{
Garbage ();
Test ();
return 0;
};
What result do you expect from Test? If you say zero, you're wrong.
The default constructor for int is trivial - it does nothing. The
variable a is never initialised, so it's value is junk. Because of the
specific junk that happens to be on the stack at the time (due to the
prior Garbage call) the most likely output is 2009147472.
In a completely trivial test program you will fluke it - on typical
desktop systems - as the memory is normally wiped to all zero as your
program is loaded. As soon as the program does anything significant,
though...
Getting back to your example, if the key "word" is new, so you insert
a new item, you are incrementing a garbage value - whatever happened
to be in some arbitrary chunk of memory prior to executing that line.
Even that behaviour is only in practice - the standard will say that
the behaviour is undefined.
That error wouldn't happen in PHP, I'll bet, but C++ isn't a scripting
language and it works differently, which was a significant part of my
point. I only disagree with it to a small degree in scripting
languages, but there are more problems with it in C++.
Your compiler might warn you that you're using an uninitialised value,
but only in some cases. In GCC, the example I gave doesn't seem to
give a warning even with -Wall. In your example, the compiler cannot
give a warning - it cannot know that the reference returned by
operator[] refers to an uninitialised location.
If you want to check if a key exists in the map, that's what the
find() member function is for.
But when the needed checks aren't done by default, they often don't
get done at all. Relative to C, the trend in C++ has been toward
language features that are safer by default.
For example, we no longer have to program an explicit null check after
using new. Pre-standardisation, new was safer than malloc (it
supported initialisation by default) but could still return null if it
couldn't allocate memory. Now, we also have an exception by default.
We don't have to remember the check - it is implicit.
Notations that cleanly separate different intents support better
automatic checks - it's that simple.