Hungarian notation

J

James Kanze

On Mar 21, 8:02 am, (e-mail address removed) wrote:
Variable names are like comments - they can lie. If the type
of a variable changes, you either have to hunt down every
occurrence of it to change, or allow it to fib about the type.

If the semantics of the variable changes, you'll want to do this
anyway.

The main argument against Hungarian notation as it generally
seems to be used is that it is pure obfuscation. In C or in
C++.
 
J

James Kanze

What kind of "hungarian notation"? If they use it to describe
the exact type of the variable, like 'li' for 'long int',
'usi' for 'unsigned short int' etc., then it is not so hard to
find many arguments against it.
If they use some form of prefixing just to describe some
aspects of the semantics of the variable, like 'i' for
ordinals, 'n' for quantities, 'p' for high-level class
pointers as opposed to 'lp' for pointers to POD structs, 'bf'
for bitwise-used variables etc., then it is not only not a
"terrible practice", but quite opposite - something worth
picking up as a habit and using extensively.

And how is that better than e.g. Count for quantities? (And
aren't quantities ordinals?) It still remains a somewhat
particular and obfuscating set of conventions for abbreviations.
(Which abbreviations are acceptable depends somewhat on the
application domain. When I was working in the telephone domain,
we spoke of TP's to the point where I'm sure that some of the
programmers didn't know that it was orginally an abbreviation
for termination point.)

Of course, some such words may still be abiguous in the native
language. I always find it frustrating that English uses the
same word, number for both nombre and numéro (French) or Anzahl
and Nummer (German). Such cases do call for an artificial
convention---in this case, the most frequent one I've seen is to
systematically use count in the first case, with number always
being numéro/Nummer (i.e. a serial number, or something like
that). You can also simply require the preposition "of" in the
first case: numberOfWidgets (rather than widgetCount), as
opposed widgetNumber.
 
J

James Kanze

(e-mail address removed) wrote:
There's _always_ a convention.

Yes. You agree on standard names for common concepts. In the
old days, when variable names were limited to six characters,
you established standard abbreviations as well. This shouldn't
be necessary for modern C++. (But might be appropriate in
contexts where the abbreviation was more frequently used than
the spelled out word. In a program dealing with certain types
of standards, for example, RFC1411 is probably preferable to
requestForComment1411.)
Certain aspects of the variable semantics have to be reflected
in variable name. That's what variable names are for, among
other things. If you don't do it at all or do it
inconsistently (without any convention), your code isn't worth
much in my book.
Prefixing is no worse and no better than anything else.

Arbitrary conventions are worse than natural conventions.
Bad example. It is not more descriptive because in
'nNumPolygons' you used two "conventions" at once. Prefix 'n'
and prefix 'Num'. What's funny is that while trying to bash
prefixes you used just another prefix yourself! Which is
essentially the same as "hungarian notaion".
The proper name of the variable to store a quantity of
polygons is 'n_polygons'. And no CamelCase, as you see.

The proper name is polygonCount or polygon_count. Or possibly
numberOfPolygons or number_of_polygons. (The choice of
camelCase or not is purely conventional.) Anything else is
obfuscation. (But there are different degrees of obfuscation,
and num_polygons is certainly preferable to n_polygons.)
 
J

James Kanze

[...]
Agreed on both counts. (CamelCase isn't innately evil, but
trying to mix CamelCase and snake_case libraries has been a
generally unpleasant experience for me.)

Just curious, but is snake_case a more or less widespread term?
(I like it, but I've never heard it before.)
Hungarian notation is useful for encoding static type-related
information in a human-readable form. I've found it useful in
assembly language ("is that integer signed or unsigned?").

Even in assembler, the name of the variable should implicitly
contain that sort of information---a count is never negative.
(The real problem in assembler is, of course, the fact that you
have variables with very uninformative names forced on you.
Names like eax or l0.)
 
D

dave_mikesell

If the semantics of the variable changes, you'll want to do this
anyway.

Semantics sure, but not type. If I use HN and my char * changes to a
CString or a std::string, I've got to change lpszFirstName to
strFirstName. Not the case if I just call it firstName.
 
J

James Kanze

On 22 mar, 05:02, Andrey Tarasevich <[email protected]>
wrote:

[...]
What "de-facto standard"? There's no "de-facto standard" of
using 'num' instead of 'n'.

There are several "standards": Merriam-Webster gives both
"count" and "number of", for example. From what I've seen, n
was well established in Fortran, and was used somewhat in the
early days of C (when external symbols were only distinctive in
the first eight characters), and num seems to have been wide
spread in Cobol. But there's absolutely no reason to not use
the real standard (i.e. Merriam-Webster, for US English) in
modern C++.
Virtually every person who ever wrote C/C++ code

Then how come I've never seen it, in a single line of C++. (I
have seen it in Fortran, and in very early C, but even there, it
was rarely raised to the level of a standard.)
and virtually every piece of code ever written by these
people. I don't understand how I can describe it in any other
way.

Maybe because he's seen C++ code written by other people than
yourself.
For example, I don't think the authors of C and C++ standard
documents ever worried about the appropriateness of using 'n'
to designate quantities (as opposed to, say, 'num' or
'count').

Actually, the use the word size most of the time. (Although the
algorithm is called count, and there's also gcount in istream.)
A misnomer, IMHO, but that's the way it is. (Elsewhere, I've
generally seen length used for the number of elements in a
container, although what's wrong with element_count is more than
I can see.)

But I don't think I'd take the standard as a good example with
regards to naming conventions. (Witness std::remove.)
 
J

James Kanze

Then your best course of action is probably to follow the
golden rule: The people with the gold make the rules.
Take the view that they can't waste your time -- they can only
waste their money.

That's the engineer talking, not the scientist:).
 
J

Jeff Schwab

James said:
On 22 mar, 05:02, Andrey Tarasevich <[email protected]>

There are several "standards": Merriam-Webster gives both
"count" and "number of", for example. From what I've seen, n
was well established in Fortran, and was used somewhat in the
early days of C (when external symbols were only distinctive in
the first eight characters), and num seems to have been wide
spread in Cobol. But there's absolutely no reason to not use
the real standard (i.e. Merriam-Webster, for US English) in
modern C++.

I'm generally with you on this one. I've recently been using
getNumElements for syntactic compatibility with a third-party library,
but I find it vaguely jarring for several reasons. I've even started
wrapping it in an overloaded size() method, so that:

size(some_container)

works regardless of whether the container is a std::vector or one of the
other library's container types.
But I don't think I'd take the standard as a good example with
regards to naming conventions. (Witness std::remove.)

It's certainly not perfect. I still try to follow the standard
library's lead wherever feasible, but I'd like to see a few changes in
the long run. For now, I'm busy digesting the upcoming syntactic
changes in the core language, and trying to get more familiar with the
newly added library features.

James, I know you've said before that the standard library does one
thing, but that code is different. How do you keep the code compatible?
If your type names begin with capital letters, how do you use an
algorithm that expects a C::iterator rather than a C::Iterator?
 
J

James Kanze

[...]
It's certainly not perfect. I still try to follow the standard
library's lead wherever feasible, but I'd like to see a few changes in
the long run. For now, I'm busy digesting the upcoming syntactic
changes in the core language, and trying to get more familiar with the
newly added library features.
James, I know you've said before that the standard library
does one thing, but that code is different. How do you keep
the code compatible? If your type names begin with capital
letters, how do you use an algorithm that expects a
C::iterator rather than a C::Iterator?

Define both of them:).

Seriously, templates do cause problems, and require some
thought. Before the standard library came along, for example,
all of my containers had a function iterator(), which returned a
GoF style iterator. I've seens worked out the necessary
techniques so that most of my iterators are both GoF type
iterators and ForwardIterators in the sense of the STL, but the
name of the function is a real problem---I can't effectively
change it without breaking a great deal of existing code, and
even if I could, what other name would be reasonable.

Luckily, for the most part, if you follow the philosophy of the
STL, anything involving templates and sequences uses iterators,
and not containers, so the name collision in the container
itself hasn't turned out to be a major problem.

More generally, of course, I do often have to ask myself which
is better. Should my XDR streams be called
ixdrstream/oxdrstream, or XDRInputStream/XDROutputStream. Given
that they're very closely modeled on iostream. And what about
my streambuf decorators (currently FilteringInputStreambuf and
FilteringOutputStreambuf).

(BTW: I use CamelCase because that's what all of my customers
have used. That's not really the issue here: you could pose the
question in terms of xdr_input_stream and xdr_output_stream just
as well.)
 
A

Andrey Tarasevich

James said:
...


Yes. You agree on standard names for common concepts. In the
old days, when variable names were limited to six characters,
you established standard abbreviations as well. This shouldn't
be necessary for modern C++. (But might be appropriate in
contexts where the abbreviation was more frequently used than
the spelled out word. In a program dealing with certain types
of standards, for example, RFC1411 is probably preferable to
requestForComment1411.)

You base your argument on the assumption that the main and/or only reason to
strive some form of brevity in identifiers is (or, more precisely, was) the
limitation on the identifier length specific to old compilers. To me this is an
obviously faulty argument. Even without any physical limitation on the total
length of the identifier, there always be some non-precise intuitive limit after
which the overly long identifiers will severely hinder the readability of the
code. In order to keep the code readable the identifiers have to be reasonably
long (and I personally stick with one that are usually seen as excessively long
by the other members of the team I work in), but the lion's share of their
lengths has to allocated to describe things closely specific to the concrete
application and/or the field. This is exactly the reason why the more universal
concepts are better expressed in the abbreviated form.

And no, the picture here is not as black-and-white as you are trying to paint it
("standard" and "non-standard" prefixes). It is also strange to see you use that
argument against prefixes specifically, while it is obvious that the same will
actually apply to all abbreviations in general. And in practice any project of
moderate level of complexity will normally have a developed system of
abbreviations, which definitely will not be "standard". More precisely, it is of
course possible to get along without any abbreviations at all, but it will be no
more than an example of stubbornly bad programming style.
Arbitrary conventions are worse than natural conventions.

Firstly, "arbitrary convention" is an oxymoron, if you think about it.

Secondly, any convention is better than no convention. After all, there one
fundamental rule under all these different views and approaches: whatever one
chooses to do, one must be consistent.
The proper name is polygonCount or polygon_count. Or possibly
numberOfPolygons or number_of_polygons. (The choice of
camelCase or not is purely conventional.)

Absolutely not. 'numberOfPolygons' and 'number_of_polygons' might seem OK only
to those who fail to see the whole picture. Once they do, it becomes obvious
that 'number_of_polygons' blatantly unacceptable in general. It is immediately
clear to anyone who ever have to deal with
'number_of_incomplete_table_based_mos_transistor_models' (it's just one
example). Of course, you can personally resolve to sticking to 'number_of_...'
no matter what and tell everyone around that your are perfectly happy with that,
but don't be surprised if from time to time people around you will give you
polite hints about the value of proper abbreviation.
 
J

James Kanze

You base your argument on the assumption that the main and/or
only reason to strive some form of brevity in identifiers is
(or, more precisely, was) the limitation on the identifier
length specific to old compilers. To me this is an obviously
faulty argument. Even without any physical limitation on the
total length of the identifier, there always be some
non-precise intuitive limit after which the overly long
identifiers will severely hinder the readability of the code.

Sure. You don't express the entire contract of a function in
the function name (e.g.
lookupWhateverAndThrowNotFoundExceptionIfNotFound---I did once
encounter a programmer who did this, with function names often
over 100 characters). Still, a variable should represent one
thing, and a function should do one thing. If you need
abbreviations to avoid unacceptably long names, it's probable
that your functions are doing too much, or your variables are
too encompassing.

In general, a verb or a verbal phrase should suffice for a
function, and a qualified noun (a noun with an adjective) for a
variable. And there's no need to abbreviate in those cases.
In order to keep the code readable the identifiers have to be
reasonably long (and I personally stick with one that are
usually seen as excessively long by the other members of the
team I work in), but the lion's share of their lengths has to
allocated to describe things closely specific to the concrete
application and/or the field. This is exactly the reason why
the more universal concepts are better expressed in the
abbreviated form.
And no, the picture here is not as black-and-white as you are
trying to paint it ("standard" and "non-standard" prefixes).
It is also strange to see you use that argument against
prefixes specifically, while it is obvious that the same will
actually apply to all abbreviations in general.

Quite. I never restricted it to prefixes, or at least, I didn't
mean too. There are certain cases where abbreviations are
advisable, but that's because (as in the case of RFC above) the
abbreviation is actually better known than what it stands for.
And in practice any project of moderate level of complexity
will normally have a developed system of abbreviations, which
definitely will not be "standard". More precisely, it is of
course possible to get along without any abbreviations at all,
but it will be no more than an example of stubbornly bad
programming style.

Or simply writing good code, which expressively says what it
does.
Firstly, "arbitrary convention" is an oxymoron, if you think
about it.

Not necessarily. Not all conventions are arbitrary. The C++
standard is a convention, and while parts of it are more or less
arbitrary, others (relative precedence between addition and
multiplication) are based on wider conventions (which may be
arbitrary), and still others are based on pragmatic issues (a
different convention would be considerably more difficult to
implement, without any compensating advantage).

In the case of names, it's true that in a certain sense, all
spelling is an arbitrary convention. But it's also true that
the one described by Merriam and Webster is more widely known
and accepted (and thus more readable) than any other convention
you might establish in house.
Secondly, any convention is better than no convention. After
all, there one fundamental rule under all these different
views and approaches: whatever one chooses to do, one must be
consistent.

Totally agreed. Systematically using ptr for pointer may not be
as good as using pointer, but it's a lot better than sometimes
using one, sometimes the other.
Absolutely not. 'numberOfPolygons' and 'number_of_polygons'
might seem OK only to those who fail to see the whole picture.
Once they do, it becomes obvious that 'number_of_polygons'
blatantly unacceptable in general. It is immediately clear to
anyone who ever have to deal with
'number_of_incomplete_table_based_mos_transistor_models' (it's
just one example).

And you're claiming that replacing number with n in the above
will somehow miraculously make it readable?
Of course, you can personally resolve to sticking to
'number_of_...' no matter what and tell everyone around that
your are perfectly happy with that, but don't be surprised if
from time to time people around you will give you polite hints
about the value of proper abbreviation.

I suspect rather than people will give me polite hints about the
value of proper abstraction and information hiding.
 
Y

Yannick Tremblay

I am looking at starting a new piece of work for a company who are
heavily into hungarian notation for C coding.

Any killer arguments for NOT carrying this terrible practice forward
into new C++ code?

What about trying Argument by Ridicule:

http://www.drdobbs.com/cpp/184403804

Conversations: Hungarian wartHogs
Jim Hyslop and Herb Sutter

:)
 
A

Andrey Tarasevich

Yannick said:
What about trying Argument by Ridicule:

http://www.drdobbs.com/cpp/184403804

Conversations: Hungarian wartHogs
Jim Hyslop and Herb Sutter

A rather prominent example of two well-recognized people engaging in
deliberate demagogy. It is interesting to note that they do seem to
understand the true meaning of the term "hungarian notation" since they
do touch it briefly at the beginning of the article (null-terminated
string example) and they also link the original article on "hungarian
notation".

However, once started, they quickly and quietly turn to deceiving the
reader by falsely presenting some ridiculous type encoding technique as
"hungarian notation". This demagogic technique is called "straw man
argument". An inexperienced reader will not notice the switch and indeed
might perceive the article as a valid argument against the notation,
while in fact it has nothing to do with it.

Note that they also use the same obviously false argument that the
change of variable type will trigger the need to change the variable
name in cases when "hungarian notation" is used - a dead giveaway that
they either have no idea what "hungarian notation" is (unlikely) or
(likely) deliberately pretend to have no idea.

Why are they doing this? I don't know. Probably just having fun.
 
D

dave_mikesell

However, once started, they quickly and quietly turn to deceiving the
reader by falsely presenting some ridiculous type encoding technique as
"hungarian notation". This demagogic technique is called "straw man
argument".

Maybe, but there were enough legitimate arguments in this thread to
hopefully rid the world of the scourge of HN.
 
Y

Yannick Tremblay

A rather prominent example of two well-recognized people engaging in
deliberate demagogy.

I did say: "Argument by Ridicule" :)
Why are they doing this? I don't know. Probably just having fun.

Well, Hyslop and Sutter "Conversations" always sounded to me as a bit
of fun :)
It is interesting to note that they do seem to
understand the true meaning of the term "hungarian notation" since they
do touch it briefly at the beginning of the article (null-terminated
string example) and they also link the original article on "hungarian
notation".

However, once started, they quickly and quietly turn to deceiving the
reader by falsely presenting some ridiculous type encoding technique as
"hungarian notation". This demagogic technique is called "straw man
argument". An inexperienced reader will not notice the switch and indeed
might perceive the article as a valid argument against the notation,
while in fact it has nothing to do with it.

Note that they also use the same obviously false argument that the
change of variable type will trigger the need to change the variable
name in cases when "hungarian notation" is used - a dead giveaway that
they either have no idea what "hungarian notation" is (unlikely) or
(likely) deliberately pretend to have no idea.

You are correct that proper Hungarian Notation is different to what is
presented in this articles.

However and unfortunately, degenerated pseudo-hungarian code
obfuscation is still far too common. Microsoft API with its lp, lpsz,
lpctstr, dw, etc. Code base where all classes start with the capital
letter 'C', code with std::string ssSomething, std::vector vWhatever.
I've seen all of that at various time, various places.

This might not be "hungarian" but I would be curious to see what the
(pseudo)-"hungarian notation" as used/enforced in the OP company is
like. Unfortunately, I suspect that it is degenerated hungarian hence
fail on many of the issues presented in this thread even those that
wouldn't apply to proper hungarian.

Does the OP has notation coding guidelines/rule that he can post for
us to be able to figure out what style they do?

Yan
 
K

Krice

Maybe, but there were enough legitimate arguments in this thread to
hopefully rid the world of the scourge of HN.

I find HN useful in enums:

enum Dwarf_Professions {dwFIGHTER=1, dwHUNTER, dwROGUE, dwMINER,
dwARCHER, dwROCKTHROWER, dwMAX_PROF};

enum Goblin_Professions {goBARBARIAN=1, goFIGHTER, goHUNTER, goROGUE,
goTINKER, goMINER, goSHAMAN, goMAX_PROF};

How else could you make difference between goblin and dwarf
professions?
 
D

dave_mikesell

I find HN useful in enums:

enum Dwarf_Professions {dwFIGHTER=1, dwHUNTER, dwROGUE, dwMINER,
dwARCHER, dwROCKTHROWER, dwMAX_PROF};

enum Goblin_Professions {goBARBARIAN=1, goFIGHTER, goHUNTER, goROGUE,
goTINKER, goMINER, goSHAMAN, goMAX_PROF};

How else could you make difference between goblin and dwarf
professions?

DWARF_ and GOBLIN_? The instant I saw "dw", I thought "double word".
Why use a cryptic two letter abbrev when you can use a more
descriptive word?
 
L

LR

Krice said:
I find HN useful in enums:

enum Dwarf_Professions {dwFIGHTER=1, dwHUNTER, dwROGUE, dwMINER,
dwARCHER, dwROCKTHROWER, dwMAX_PROF};

enum Goblin_Professions {goBARBARIAN=1, goFIGHTER, goHUNTER, goROGUE,
goTINKER, goMINER, goSHAMAN, goMAX_PROF};

How else could you make difference between goblin and dwarf
professions?

How about this?

struct Dwarf {
enum Profession {
FIGHTER=1, HUNTER, ROGUE, MINER,
ARCHER, ROCKTHROWER, MAX_PROF};
};

struct Goblin {
enum Profession {
BARBARIAN=1, FIGHTER, HUNTER, ROGUE,
TINKER, MINER, SHAMAN, MAX_PROF};
};

LR
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,950
Members
47,500
Latest member
ArianneJsb

Latest Threads

Top