Pattern matching for a terminal emulator

C

Captain Dondo

I'm working on a terminal emulator for an embedded system.

The key requirements are small size, code clarity, maintainability, and
portability. We have machines that regularly see a service life of 30 years
so it's not impossible that this code will be around that long.

I'm trying to use termcap info to map incoming strings to display
actions.

In other words, I have an array that holds termcap info:

termcap = {
"ae=^O",
"as=^N",
"cm=\E[%i%d;%dH",
"cs=\E[%i%d;%dr",
...
}

I'm trying to come up with a nice, clear algorithm for matching incoming
characters to the patterns in the termcap array.

So far I've struck out pretty much completely.

For example, for the cm string, the terminal can see incoming strings like
this:

{esc}[3;4H
{esc}[;4H
{esc}[3;H

I can't quite come up with a parser that can handle that, and which
doesn't get all convoluted...

I would love to have some suggestions on how to match those patterns.

--Yan
 
B

Bin Chen

I'm working on a terminal emulator for an embedded system.

The key requirements are small size, code clarity, maintainability, and
portability. We have machines that regularly see a service life of 30 years
so it's not impossible that this code will be around that long.

I'm trying to use termcap info to map incoming strings to display
actions.

In other words, I have an array that holds termcap info:

termcap = {
"ae=^O",
"as=^N",
"cm=\E[%i%d;%dH",
"cs=\E[%i%d;%dr",
...
}

I'm trying to come up with a nice, clear algorithm for matching incoming
characters to the patterns in the termcap array.

So far I've struck out pretty much completely.

For example, for the cm string, the terminal can see incoming strings like
this:

{esc}[3;4H
{esc}[;4H
{esc}[3;H

I can't quite come up with a parser that can handle that, and which
doesn't get all convoluted...

I would love to have some suggestions on how to match those patterns.

Have you considered the regular expression?
 
C

Captain Dondo

V Thu, 19 Apr 2007 06:47:30 -0700, Bin Chen napsal(a):
I'm working on a terminal emulator for an embedded system.

The key requirements are small size, code clarity, maintainability, and
portability. We have machines that regularly see a service life of 30 years
so it's not impossible that this code will be around that long.

I'm trying to use termcap info to map incoming strings to display
actions.

In other words, I have an array that holds termcap info:

termcap = {
"ae=^O",
"as=^N",
"cm=\E[%i%d;%dH",
"cs=\E[%i%d;%dr",
...
}

I'm trying to come up with a nice, clear algorithm for matching incoming
characters to the patterns in the termcap array.

So far I've struck out pretty much completely.

For example, for the cm string, the terminal can see incoming strings like
this:

{esc}[3;4H
{esc}[;4H
{esc}[3;H

I can't quite come up with a parser that can handle that, and which
doesn't get all convoluted...

I would love to have some suggestions on how to match those patterns.

Have you considered the regular expression?

I have.... I'm just not sure how to apply it.

The characters come in one at a time; I need to scan the list to see if
the character matches the first character of any pattern. If it does,
then I need to remember that.

When the next character comes in, I need to scan the list to see if it
matches the next character in the patterns, and so on.

The pattern match ends when the match fails, at which point I need to see
if any one pattern was matched fully.

.....
 
R

Richard Bos

Captain Dondo said:
I'm trying to use termcap info to map incoming strings to display
actions.

In other words, I have an array that holds termcap info:

termcap = {
"ae=^O",
"as=^N",
"cm=\E[%i%d;%dH",
"cs=\E[%i%d;%dr",
...
}

Except for the \E, which isn't a valid C escape character, and the ^O
and ^N, which aren't C escape characters at all, those look very much
like *scanf() strings to me.
For example, for the cm string, the terminal can see incoming strings like
this:

{esc}[3;4H
{esc}[;4H
{esc}[3;H

I can't quite come up with a parser that can handle that, and which
doesn't get all convoluted...

Perhaps you don't need to? You might be able to get sscanf() to do the
job for you.
Translate the \E and ^Whatever in your strings to the corresponding real
C characters (presumably your compiler allows \E as an extension, so you
_might_ not need to do that bit, but your code does become non-portable
if you rely on this; and ^Letter is presumably quite easy to do). Then,
see if you can mangle either the %spec bits, or your call to sscanf(),
so that it accepts your input.
If that doesn't work, the easiest way to get your hands on a sscanf()-
variation which can handle your termcap strings would be to start from
normal sscanf() code and modify that. If Ganuck code would be
acceptable, you could use that; if not, many textbooks have you write
one as an exercise, and include simplified sample code, which may
already be useful enough. IIRC K&R is one of these.

Richard
 
M

mark_bluemel

I'm working on a terminal emulator for an embedded system.

The key requirements are small size, code clarity, maintainability, and
portability. We have machines that regularly see a service life of 30 years
so it's not impossible that this code will be around that long.

I'm trying to use termcap info to map incoming strings to display
actions.

In other words, I have an array that holds termcap info:

termcap = {
"ae=^O",
"as=^N",
"cm=\E[%i%d;%dH",
"cs=\E[%i%d;%dr",
...
}

I'm trying to come up with a nice, clear algorithm for matching incoming
characters to the patterns in the termcap array.

So far I've struck out pretty much completely.

For example, for the cm string, the terminal can see incoming strings like
this:

{esc}[3;4H
{esc}[;4H
{esc}[3;H

I can't quite come up with a parser that can handle that, and which
doesn't get all convoluted...

I'd look at how others have done it - for example in tools like xterm,
or putty...
 
C

Captain Dondo

V Thu, 19 Apr 2007 14:18:42 +0000, Richard Bos napsal(a):
Captain Dondo said:
I'm trying to use termcap info to map incoming strings to display
actions.

In other words, I have an array that holds termcap info:

termcap = {
"ae=^O",
"as=^N",
"cm=\E[%i%d;%dH",
"cs=\E[%i%d;%dr",
...
}

Except for the \E, which isn't a valid C escape character, and the ^O
and ^N, which aren't C escape characters at all, those look very much
like *scanf() strings to me.
For example, for the cm string, the terminal can see incoming strings like
this:

{esc}[3;4H
{esc}[;4H
{esc}[3;H

I can't quite come up with a parser that can handle that, and which
doesn't get all convoluted...

Perhaps you don't need to? You might be able to get sscanf() to do the
job for you.
Translate the \E and ^Whatever in your strings to the corresponding real
C characters (presumably your compiler allows \E as an extension, so you
_might_ not need to do that bit, but your code does become non-portable
if you rely on this; and ^Letter is presumably quite easy to do). Then,
see if you can mangle either the %spec bits, or your call to sscanf(),
so that it accepts your input.
If that doesn't work, the easiest way to get your hands on a sscanf()-
variation which can handle your termcap strings would be to start from
normal sscanf() code and modify that. If Ganuck code would be
acceptable, you could use that; if not, many textbooks have you write
one as an exercise, and include simplified sample code, which may
already be useful enough. IIRC K&R is one of these.

I've thought about running a pre-parser to replace those non-standard
chars, but I never followed it through to using sscanf....

I think I'll try that. There's some slight overhead - I'd have to create
multiple entries for the variants that can omit numbers - but that's easy
enough to do.

Thanks!

--Yan
 
C

CptDondo

I'd look at how others have done it - for example in tools like xterm,
or putty...

Most of those use the convoluted method... Basically the patterns are
hard-coded into the program logic. I'm trying to do something where we
can add functionality to the terminal (e.g. going from monochrome to
color) without rewriting the logic....

But I think I've hit on something thanks to the comments here. :)

--Yan
 
D

David Thompson

You need this generality only if you need someone other than (after)
the developer(s), like a user or admin, to modify the emulation, or
perhaps to select among multiple (many?) emulations. To just emulate a
specific terminal/mode (or even family), I would hardcode at least the
common structure for it (e.g. the X3.64 style of CSI, operands,
somecols2-3 modifiers, cols4-5char terminator or cols6-7char + I don't
recall) leaving the rest of the problem simpler.

But if you want to (or must) stay with using termcap(ish) strings:
In other words, I have an array that holds termcap info:
"cm=\E[%i%d;%dH",
Except for the \E, which isn't a valid C escape character, and the ^O
and ^N, which aren't C escape characters at all, those look very much
like *scanf() strings to me.
For example, for the cm string, the terminal can see incoming strings like
this:

{esc}[3;4H
{esc}[;4H
{esc}[3;H

I can't quite come up with a parser that can handle that, and which
doesn't get all convoluted...

Perhaps you don't need to? You might be able to get sscanf() to do the
job for you.
Translate the \E and ^Whatever in your strings to the corresponding real
C characters (presumably your compiler allows \E as an extension, so you
_might_ not need to do that bit, but your code does become non-portable
if you rely on this; and ^Letter is presumably quite easy to do). Then,
see if you can mangle either the %spec bits, or your call to sscanf(),
so that it accepts your input.

But: *printf %i will generate only decimal digits, but *scanf %i will
accept optional whitespace, optional sign and digits, and also allow
0octal and 0xhex forms; *scanf %d and (even!) %u will allow the first
two but not the third; these might result in false matches for
improper input, if that is a concern. (Perhaps not, if the source from
which the terminal emulator is receiving this data never makes
mistakes, and the comms path never silently corrupts.) Conversely,
*scanf %i or %d will fail, and cause scanning to stop, if the number
is entirely omitted, as is legal for (most) X3.64/VT100 escape
sequences, as the OP's example shows. Similarly %i (or %d) followed
immediately by %d, applied to contiguous digits which they could have
generated (or did) in *printf, will fail because the first specifier
doesn't 'see' the boundary that (would have) occurred in generation
and uses up (all) the data that should match the second specifier.

Also a space in *scanf format will match any amount of whitespace in
the data or none, not just a single space. If necessary, you can do
the latter by changing to %1[ ], but since I think trying to use
actual *scanf is not worth it anyway, see below, I wouldn't bother. I
also don't recall offhand any terminal commands that use exactly a
space as (required) data, although some (ADM3A, IIRC) do use a single
character whose code STARTS at (ASCII) space.
If that doesn't work, the easiest way to get your hands on a sscanf()-
variation which can handle your termcap strings would be to start from
normal sscanf() code and modify that. If Ganuck code would be
acceptable, you could use that; if not, many textbooks have you write
one as an exercise, and include simplified sample code, which may
already be useful enough. IIRC K&R is one of these.
This problem appears to me enough different from and simpler than what
*scanf must do that I would find it easier to start from scratch and
build up, perhaps looking at *scanf for ideas in the unlikely event I
had difficulty with a particular point.

I would probably also try, if it doesn't make the code too complex
(and difficult to maintain) or require more space than allowable,
which could be any at all if this must run multithreaded and can't be
provided with an instance pointer or similar, to make the scans
'restartable' so that for each possible currently possible match I
record the state and for the next received character only advance or
fail each scan from that state.

- formerly david.thompson1 || achar(64) || worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top