The >> token

Peter Ammon · Dec 15, 2003

As we know, due to C++'s "longest match" rule, the >> token causes
headaches when working with nested templates, e.g.

vector<vector<int>>

will not parse correctly without inserting a space between the two >
signs. Why have a >> token at all? Why not have > be the token, and
handle >> in the grammar as two > tokens?

This would permit code like 3 > > 1, but that seems harmless to me.

Dave · Dec 15, 2003

Peter Ammon said:
As we know, due to C++'s "longest match" rule, the >> token causes
headaches when working with nested templates, e.g.

vector<vector<int>>

will not parse correctly without inserting a space between the two >
signs. Why have a >> token at all? Why not have > be the token, and
handle >> in the grammar as two > tokens?

This would permit code like 3 > > 1, but that seems harmless to me.

Hmmm, then what would you use for "greater than"? If it's "overloaded", you
then end up with context sensitivity issues, which makes grammars much
harder to deal with...

Andrey Tarasevich · Dec 15, 2003

Peter said:
Why have a >> token at all? Why not have > be the token, and
handle >> in the grammar as two > tokens?

That would make the grammar much more complex than it is now. It is not
worth it.

Rolf Magnus · Dec 15, 2003

Dave said:
Hmmm, then what would you use for "greater than"? If it's
"overloaded", you then end up with context sensitivity issues, which
makes grammars much harder to deal with...

There are already context sensitivity issues. That's the reason why you
can't write vector<vector<int>>. The "greater" token already is
"overloaded".

M. Akkerman · Dec 16, 2003

Because it's pretty usefull. I'm sure a desktop programmer won't
bother much with stuff like individual bits but if you're going to
code for a lower level layer (example: device driver) then
manuipulating bits is your only friend.

Unforgiven · Dec 16, 2003

M. Akkerman said:
Because it's pretty usefull. I'm sure a desktop programmer won't
bother much with stuff like individual bits but if you're going to
code for a lower level layer (example: device driver) then
manuipulating bits is your only friend.

I don't believe that that's what the OP meant. He wants to keep bitshift of
course, but wants to have the compilers not treat '>>' as a seperate token,
but instead have the compiler determine by context what two consecutive '>'
tokens mean. If that had been done, it would have been possible to write >>
(without a space) at the end of nested templates because the compiler would
see the two seperate '>' tokens and determine that they can't be a bitshift
in that context so correctly see them as the end of the template
instantiation. Currently, the grammatical analyzer gets a '>>' token from
the lexical analyzer in that situation, and concludes that that token is
invalid in that context.

Jerry Coffin · Dec 17, 2003

[ ... ]

There are already context sensitivity issues. That's the reason why you
can't write vector<vector<int>>. The "greater" token already is
"overloaded".

That's not context sensitivity. Context sensitivity is when your
grammar contains at least one production like:

xA ::= whatever

where an 'A' is recognized as a particular syntactic element ONLY in the
context of an 'x'. Otherwise, it's recognized as some other syntactic
element.

In the case of '<<' or '>>', there's no such thing -- distinguishing
between '>' and '>>' is done entirely at the lexical level, before the
grammar sees either one at all. By the time the parser sees any of
these, the lexer has converted each one to a token. The lexer doesn't
use any context sensitivity either -- it just creates a token out of the
longest sequence of input characters that it can. I.e. it reads in
characters until it encounters one that can't possibly be part of any
token that started with the characters that have already been read. At
that point, it does one of two things: returns the characters its
already read as a token, or else signals an error because what it's read
isn't a token, and the next character in the input can't be part of any
token that could start with the characters that have already been read
either.

There are a few parts of C++ that involve context sensitivity, but
they're mostly there to resolve ambiguities in the grammar proper --
e.g. in some cases, the choice between a declaration and a definition is
context sensitive.

Rolf Magnus · Dec 17, 2003

Jerry said:
[ ... ]

There are already context sensitivity issues. That's the reason why
you can't write vector<vector<int>>. The "greater" token already is
"overloaded".

Click to expand...

That's not context sensitivity. Context sensitivity is when your
grammar contains at least one production like:

xA ::= whatever

where an 'A' is recognized as a particular syntactic element ONLY in
the
context of an 'x'. Otherwise, it's recognized as some other syntactic
element.

In the case of '<<' or '>>', there's no such thing --

I was talking about something like 'if (a<3)' vs. 'vector<int>', not
about the '<<' token. Sorry, I should have said that more clearly.

Jerry Coffin · Dec 18, 2003

[ talking about context sensitivity ]

I was talking about something like 'if (a<3)' vs. 'vector<int>', not
about the '<<' token. Sorry, I should have said that more clearly.

That's still not really context sensitivity, at least in the way the
term is normally defined. Basically, the grammar just has something
like (simplifying drastically):

cmpop: '=' | '<' | '>' | '<=' | '>='
;

expression: operand cmpop operand
| lots of other possibilities elided
;

/* ... */
template_instantiation: template_name '<' templ_args '>' name ';'
;

and a given input will only match one of these. There is (usually) a
bit of trickery involved in recognizing whether 'x' is the name of a
template or some other name (e.g. of a variable), and while this means
the parser needs access to the symbol table, it still isn't context
sensitivity in the classic sense.

In the end, none of this is really new or different with C++ though --
in C, the compiler runs into the same kinds of things, such as '&' being
both a unary operator to take an address and a binary operator to do a
bitwise AND. Here again, the parser has to

Jerry Coffin · Dec 18, 2003

[ ... ]

cmpop: '=' | '<' | '>' | '<=' | '>='

Oops -- that should be '==' not '=', of course...

Adobe Acrobat JavaScript PDF Script Issues: File Matching and Dynamic Retrieval	0	Nov 29, 2024
problems with racc: $end token	13	Dec 11, 2003
TPG error when using 't' as the first letter of a token	5	Nov 18, 2004
Racc: when is on_error called?	1	Oct 29, 2007
Best Way to Maintain User Security Token Across Multiple Servers?	1	Sep 16, 2005
[ANN] ruby_parser 2.0.0 Released	8	Oct 23, 2008
questions of idiom	3	Jun 7, 2010
Incremental build systems, infamake	20	Jul 25, 2011

The >> token

Peter Ammon

Dave

Andrey Tarasevich

Rolf Magnus

M. Akkerman

Unforgiven

Jerry Coffin

Rolf Magnus

Jerry Coffin

Jerry Coffin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads