Trigraphs

  • Thread starter Christopher Benson-Manica
  • Start date
C

Christopher Benson-Manica

Assuming trigraphs are specified in the standard currently, how much longer do
you suppose that will be the case? Are there still enough "short" keyboards
out there to justisfy trigraphs?
 
A

Andreas Kahari

Assuming trigraphs are specified in the standard currently, [reformatted paragraph]
how much longer do you suppose that will be the case? Are
there still enough "short" keyboards out there to justisfy
trigraphs?

You mean, how long will it take for everyone to switch to ASCII
keyboards?

It is not simply a question of what keyboards are being used
(the standard never mentiones the word "keyboard") but what
character sets are being used by C programmers.
 
C

Christopher Benson-Manica

Andreas Kahari said:
It is not simply a question of what keyboards are being used
(the standard never mentiones the word "keyboard") but what
character sets are being used by C programmers.

You mean '{' and '}' aren't found in many character sets...?
 
A

Andreas Kahari

You mean '{' and '}' aren't found in many character sets...?

If there exists character sets that does not include { or },
there must clearly be *some* way of replacing these characters
in the source of a C program.
 
L

Lew Pitcher

Christopher said:
You mean '{' and '}' aren't found in many character sets...?

More to the point, '[' and ']' aren't found in certain mainframe (EBCDIC)
charactersets. Other characters are missing from EBCDIC-US as well.


--

Lew Pitcher, IT Consultant, Application Architecture
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed here are my own, not my employer's)
 
T

Tom Zych

Lew said:
More to the point, '[' and ']' aren't found in certain mainframe (EBCDIC)
charactersets. Other characters are missing from EBCDIC-US as well.

Ecch. EBCDIC. COBOL. Trigraphs. gets(). void main. The ugly side of
computer science...
 
G

Glen Herrmannsfeldt

Lew Pitcher said:
Christopher said:
You mean '{' and '}' aren't found in many character sets...?

More to the point, '[' and ']' aren't found in certain mainframe (EBCDIC)
charactersets. Other characters are missing from EBCDIC-US as well.

Even worse, there are multiple definitions for some of the characters. '['
and ']' have existed for many years on the TN print train, but somehow that
wasn't good enough to define them in EBCDIC.

According to the card I have here, GX20-1850-2, '{' in EBCDIC is at X'C0'
but the TN print train has it at X'8B', and '}' in EBCDIC at X'D0' and TN
has it at X'9B'.

Also, EBCDIC has both a solid and split vertical bar. Some ASCII tables use
one, and some the other, for the printable representation, and I believe
that has also been a problem with conversion tables.

Just to continue the confusion, EBCDIC has CR, NL, and LF control
characters, X'0D', X'15', and X'25' respectively. Which one should C use as
the '\n' character?

PL/I has the opposite problem, as EBCDIC has a character for the NOT
operator, which ASCII doesn't have.

-- glen
 
C

Christian Bau

Lew Pitcher said:
Christopher said:
You mean '{' and '}' aren't found in many character sets...?

More to the point, '[' and ']' aren't found in certain mainframe (EBCDIC)
charactersets. Other characters are missing from EBCDIC-US as well.

And before anyone tries to get rid of trigraphs, I would like to hear a
suggestion how to do this without breaking programs that use them.
 
S

Serve La

Christopher Benson-Manica said:
Assuming trigraphs are specified in the standard currently, how much longer do
you suppose that will be the case? Are there still enough "short" keyboards
out there to justisfy trigraphs?

I've been told that italian and danish keyboards don't support some of C's
common symbols. Personally, I wouldn't be using C if I had to use trigraphs.
 
M

Mark Gordon

Lew Pitcher said:
Christopher said:
Andreas Kahari <[email protected]> spoke thus:


It is not simply a question of what keyboards are being used
(the standard never mentiones the word "keyboard") but what
character sets are being used by C programmers.


You mean '{' and '}' aren't found in many character sets...?

More to the point, '[' and ']' aren't found in certain mainframe
(EBCDIC) charactersets. Other characters are missing from EBCDIC-US
as well.

And before anyone tries to get rid of trigraphs, I would like to hear
a suggestion how to do this without breaking programs that use them.

Write a program to parse the source and get rid of the trigraphs.
 
A

Andreas Kahari

On Fri, 12 Sep 2003 22:51:16 +0100


Write a program to parse the source and get rid of the trigraphs.

You can't do that with code that is generated automatically and
then interpreted on a non-ASCII system (I don't know if this
arrangement exists).


- I don't like macros, let's do away whith them!
- That will break a lot of old code.
- I'll just write a program that replaces macros with their definition...

Not quite the same maybe.

The point of tri- and digraphs is to make C programming possible
on systems which does not support the ASCII character set, but
that does support ISO 646. If your system does support the
characters that the tri- and digraphs are replacing, then you
are very welcome to substitute them in all your code, but you
probably won't be able to send the code back to its originator
without replacing them again.
 
H

Hallvard B Furuseth

Andreas said:
You can't do that with code that is generated automatically and
then interpreted on a non-ASCII system

ASCII has nothing to do with it. The standard says that the source
character set must include all the C characters. They just don't have
to look like they do in the standard. For example, the character ¥ may
be treated by the compiler as if it was a \. If people don't like
that, they can write ??/ instead.

(In practice it is the other way around. The OS was typically typically
written by Americans and thinks it is running on ASCII hardware, but the
site's terminals display ¥ for char(92) = ASCII '\'.)
The point of tri- and digraphs is to make C programming possible
on systems which does not support the ASCII character set,

Almost. It is to allow people on such systems to avoid such hacks, and
to have something to write if certain characters are not readily
available on their keyboards.
 
H

Hallvard B Furuseth

Glen said:
Even worse, there are multiple definitions for some of the characters. '['
and ']' have existed for many years on the TN print train, but somehow that
wasn't good enough to define them in EBCDIC.

What are TN and the TN print train?
Also, EBCDIC has both a solid and split vertical bar. Some ASCII
tables use one, and some the other, for the printable representation,
and I believe that has also been a problem with conversion tables.

I'm curious: Is there a similar problem when converting to Unicode,
or do people agree on which is which in that case?
Just to continue the confusion, EBCDIC has CR, NL, and LF control
characters, X'0D', X'15', and X'25' respectively. Which one should C
use as the '\n' character?

If CR is Carriage Return, I presume it should be \r.

What do EBCDIC NL and LF do?
 
G

Glen Herrmannsfeldt

Hallvard B Furuseth said:
Glen said:
Even worse, there are multiple definitions for some of the characters. '['
and ']' have existed for many years on the TN print train, but somehow that
wasn't good enough to define them in EBCDIC.

What are TN and the TN print train?

The traditional IBM printers used a train or chain of characters. The more
usual ones didn't have lower case characters, or other less commonly used
characters. There is one with the Fortran 48 character set, one with the
PL/I 60 character set, and TN has 120, including lower case letters, '[',
']', '{', '}', and a variety of characters for drawing boxes.
I'm curious: Is there a similar problem when converting to Unicode,
or do people agree on which is which in that case?


If CR is Carriage Return, I presume it should be \r.

What do EBCDIC NL and LF do?

LF is linefeed, and NL is newline. The ASCII LF character is commonly used
as the C '\n' character, as ASCII doesn't have a newline character. But
EBCDIC does!

I don't know at all what unicode has for control characters.

-- glen
 
C

Chris Torek

Hallvard B Furuseth said:

The traditional IBM printers used a train or chain of characters.

More specifically, two common kinds of printers were "band" and
"chain" printers. A chain printer is pretty easy to imagine (and
a band printer is pretty much the same thing, using a sort of
rubber-band or metal band instead of interlocking chain pieces).
The chain is just long enough to string completely around the
cogs at the two ends:

[view from above]
_____________________
/| |\
|-o- -o-|
\|___________________|/
[paper goes here]

Behind the chain there are a bunch of tiny hammers, one for each
column that can print a character. The chain goes around and around
in a big loop, and whenever a character on the chain is ready to
print in that position on the paper, the hammer fires, pushing the
chain-link against the paper (with an ink ribbon in between of
course). Once all the characters on a given line have been printed,
the paper is fed up one line, and the next line can be printed.

The operators would get quite mad at you if you told the machine
to print out lines of text that exactly matched the chain sequence.
The reason is that this meant that the chain would rotate around
until *every character was in place simultaneously*, and then ALL
the hammers would fire at once. This could (and often would) break
the chain, sending printer bits flying. (Presumably some chain
printers incorporated logic to slow down such output, thus not
wrecking the machinery.)

In any case, the chains were replaceable, and some print jobs would
require different "print trains" or "print chains". The printer
had to be told which sequence was loaded, and which characters were
physically available.

[In EBCDIC,]
LF is linefeed, and NL is newline. The ASCII LF character is commonly used
as the C '\n' character, as ASCII doesn't have a newline character. But
EBCDIC does!

Actually, ASCII code X'0A' (in EBCDIC-ese), 10 decimal, is *both*
linefeed *and* newline. Which you get depends on which ASCII you
are using. Thus, if you are going to have a newline at all, this
is the correct choice (but then you have no separate linefeed,
unlike EBCDIC).
 
J

Jack Klein

Lew Pitcher said:
Christopher said:
Andreas Kahari <[email protected]> spoke thus:


It is not simply a question of what keyboards are being used
(the standard never mentiones the word "keyboard") but what
character sets are being used by C programmers.


You mean '{' and '}' aren't found in many character sets...?

More to the point, '[' and ']' aren't found in certain mainframe (EBCDIC)
charactersets. Other characters are missing from EBCDIC-US as well.

And before anyone tries to get rid of trigraphs, I would like to hear a
suggestion how to do this without breaking programs that use them.

All three of them???

;)

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq
 
T

Tom Zych

Chris said:
The operators would get quite mad at you if you told the machine
to print out lines of text that exactly matched the chain sequence.
The reason is that this meant that the chain would rotate around
until *every character was in place simultaneously*, and then ALL
the hammers would fire at once. This could (and often would) break
the chain, sending printer bits flying. (Presumably some chain
printers incorporated logic to slow down such output, thus not
wrecking the machinery.)

Hmm...I wonder if someone in Fleetwood Mac used to work in
computing?

http://www.buckinghamnicks.net/bn/rumours/thechain.html

;)
 
R

Richard Heathfield

Jack said:
All three of them???

;)

<shrug> I've worked on dozens of programs that use trigraphs (for several
separate companies), although not in the last four years or so. I would be
unsurprised to discover myself working in such an environment again, and
I'd like to use C in that environment if I may, so let's allow C to
continue to support these platforms. Thanks.
 
G

Glen Herrmannsfeldt

Chris Torek said:
The traditional IBM printers used a train or chain of characters.

More specifically, two common kinds of printers were "band" and
"chain" printers. A chain printer is pretty easy to imagine (and
a band printer is pretty much the same thing, using a sort of
rubber-band or metal band instead of interlocking chain pieces).
The chain is just long enough to string completely around the
cogs at the two ends:

[view from above]
_____________________
/| |\
|-o- -o-|
\|___________________|/
[paper goes here]

Behind the chain there are a bunch of tiny hammers, one for each
column that can print a character. The chain goes around and around
in a big loop, and whenever a character on the chain is ready to
print in that position on the paper, the hammer fires, pushing the
chain-link against the paper (with an ink ribbon in between of
course). Once all the characters on a given line have been printed,
the paper is fed up one line, and the next line can be printed.

Actually, the hammers are behind the paper, and push the paper toward the
ribbon and characters for a chain printer. I think band printers have them
on flexible metal fingers, so the hammers can work the other way.
The operators would get quite mad at you if you told the machine
to print out lines of text that exactly matched the chain sequence.
The reason is that this meant that the chain would rotate around
until *every character was in place simultaneously*, and then ALL
the hammers would fire at once. This could (and often would) break
the chain, sending printer bits flying. (Presumably some chain
printers incorporated logic to slow down such output, thus not
wrecking the machinery.)

I believe at least on the 1403 that the spacing of the characters on the
train is slightly larger than the print column spacing, so that can't
happen. Still, if you arranged for many to print close enough together,
printer bits may go flying. The advantage of chain printers over drum
printers is that they eye is less bothered by characters displaced
horizontally than vertically. Also, the ease of changing the print
character set, at least in the days before laser printers.
In any case, the chains were replaceable, and some print jobs would
require different "print trains" or "print chains". The printer
had to be told which sequence was loaded, and which characters were
physically available.

The advantage of chain printers is that for smaller character sets you can
put more copies of each character on, and it will print faster. There are
even chains with only numbers and number related punctuation. All the
older high level languages used upper case characters only. It was common
for the printer to map lower case to upper case, which made some debugging
hard. The compiler would complain about something that looked just fine on
the printout.
[In EBCDIC,]
LF is linefeed, and NL is newline. The ASCII LF character is commonly used
as the C '\n' character, as ASCII doesn't have a newline character. But
EBCDIC does!

Actually, ASCII code X'0A' (in EBCDIC-ese), 10 decimal, is *both*
linefeed *and* newline. Which you get depends on which ASCII you
are using. Thus, if you are going to have a newline at all, this
is the correct choice (but then you have no separate linefeed,
unlike EBCDIC).

I always thought that was just a Unix/C convention, and wasn't part of the
ASCII standard.

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,079
Messages
2,570,574
Members
47,207
Latest member
HelenaCani

Latest Threads

Top