Convert string with control character in caret notation to realcontrol character string.

B

Bart Vandewoestyne

I am working my way through the book 'Modern Compiler Implementation in C' and am now working on the lexer from Chapter 2:

https://github.com/BartVandewoestyn...Compiler_Implementation_in_C/chap02/tiger.lex

Part of the exercise is that strings with escape sequences and control characters in caret notation must be supported. Between lines 153 and 163, i make sure my strings support escape sequences like \ddd with ASCII code ddd (3 decimal digits). Between lines 187 and 194 I try to do the same for control characters in caret notation. I haven't succeeded to put the value ofthe control character in the result variable yet. I wonder if it is doable with a single sscanf line like for the \ddd case...

What would be the most elegant and standard-conforming way to grab the value of the matched control character?

Regards,
Bart
 
J

James Kuyper

I am working my way through the book 'Modern Compiler Implementation in C' and am now working on the lexer from Chapter 2:

https://github.com/BartVandewoestyn...Compiler_Implementation_in_C/chap02/tiger.lex

Part of the exercise is that strings with escape sequences and control characters in caret notation must be supported. Between lines 153 and 163, i make sure my strings support escape sequences like \ddd with ASCII code ddd (3 decimal digits). Between lines 187 and 194 I try to do the same for control characters in caret notation. I haven't succeeded to put the value of the control character in the result variable yet. I wonder if it is doable with a single sscanf line like for the \ddd case...

What would be the most elegant and standard-conforming way to grab the value of the matched control character?

The C standard provides ways of specifying only a few control characters
(5.2.2p2):
Alphabetic escape sequences representing nongraphic characters in the
execution character set are intended to produce actions on display
devices as follows:

\a (alert) Produces an audible or visible alert without changing the
active position.
\b (backspace) Moves the active position to the previous position on
the current line. If the active position is at the initial position
of a line, the behavior of the display device is unspecified.
\f ( form feed) Moves the active position to the initial position at
the start of the next logical page.
\n (new line) Moves the active position to the initial position of the
next line.
\r (carriage return) Moves the active position to the initial position
of the current line.
\t (horizontal tab) Moves the active position to the next horizontal
tabulation position on the current line. If the active position is
at or past the last defined horizontal tabulation position, the
behavior of the display device is unspecified.
\v (vertical tab) Moves the active position to the initial position of
the next vertical tabulation position. If the active position is at
or past the last defined vertical tabulation position, the
behavior of the display device is unspecified.

Note that the numerical values of these escape sequences are not
specified by the standard, only the intended behavior if they are sent
to the display device. The standard goes out of it's way to avoid
specifying anything more than it absolutely must about the character
sets supported by a C implementation, or the encodings used for those
characters sets.

If you need to refer to any control characters that don't correspond to
one of the above escape sequences, there's no solution that's portable
to all implementations of C. If you're willing to restrict the
portability of your code to systems using a particular encoding for the
control characters you're interested in, then you can use the octal
escape sequences to specify them explicitly.
 
J

James Kuyper

What language is that in?

It looks like 'lex', or perhaps 'flex', which would be consistent with
the extension on the file name. Several key parts of a lex file are
transferred, almost verbatim, to the output file which is ordinary
(though rather convoluted and unreadable) C code. Since his question is
about how to represent control characters in that C code, the question
is topical, but it requires a knowledge of lex to realize that fact.
 
B

Ben Bacarisse

Bart Vandewoestyne said:
I am working my way through the book 'Modern Compiler Implementation
in C' and am now working on the lexer from Chapter 2:

https://github.com/BartVandewoestyn...Compiler_Implementation_in_C/chap02/tiger.lex

Part of the exercise is that strings with escape sequences and control
characters in caret notation must be supported. Between lines 153 and
163, i make sure my strings support escape sequences like \ddd with
ASCII code ddd (3 decimal digits). Between lines 187 and 194 I try to
do the same for control characters in caret notation. I haven't
succeeded to put the value of the control character in the result
variable yet. I wonder if it is doable with a single sscanf line like
for the \ddd case...

No, that's "trying too hard". It's simpler than that.
What would be the most elegant and standard-conforming way to grab the
value of the matched control character?

result = yytext[2] - '@';

This assumes a lot about the character set, but that's fine in this case
because the notation itself ('^A' etc.) is tied to the character set.
 
B

Bart Vandewoestyne

Before I looked at Ben's post, the solution that I came up with was:

char key;
sscanf(yytext, "^%c", &key);
*string_buf_ptr++ = key - 64;

But Ben's solution reads:
result = yytext[2] - '@';

which I corrected to

result = yytext[1] - '@';

;-)

and this is indeed a lot shorter and more elegant! I love it when I can make my code more readable with shorter statements! :)

Regards,
Bart
 
B

Ben Bacarisse

Bart Vandewoestyne said:
Before I looked at Ben's post, the solution that I came up with was:

char key;
sscanf(yytext, "^%c", &key);
*string_buf_ptr++ = key - 64;

But Ben's solution reads:
result = yytext[2] - '@';

which I corrected to

result = yytext[1] - '@';

;-)

Yes, I am sure you are right about the 1 but here's why I wrote 2: The
code you had when i looked was: sscanf(yytext + 1, "^%c", &result); so I
assumed that the ^ was in yytext[1] and the character to be adjusted
would therefore be in yytext[2]. :)

<snip>
 
B

Bart Vandewoestyne

Yes, I am sure you are right about the 1 but here's why I wrote 2: The
code you had when i looked was: sscanf(yytext + 1, "^%c", &result); so I
assumed that the ^ was in yytext[1] and the character to be adjusted
would therefore be in yytext[2]. :)

I forgive you ;-)

Regards,
Bart
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,222
Members
46,810
Latest member
Kassie0918

Latest Threads

Top