printf in glorious colour

K

Keith Thompson

glen herrmannsfeldt said:
(snip, I wrote) [...]
Interesting. Do you have a reference for this proposed ASCII-8?
It turns out to be difficult to Google; most of the references I've
found incorrectly refer to things like Latin-1 or Windows-1252 as
"8-bit ASCII".
If ASCII-8 had caught on, with some common characters requiring 8
bits, I wonder if UTF-8 would have been possible.

Look in the Appendix of the S/360 Principles of Operation. Later
versions have a better description of it, such as the -7 (Dec 1967)
version from bitsavers.

There are still plenty of code points, they just moved them around.

Yes, and I wonder why.

For those who don't want to download the PDFs:

http://bitsavers.trailing-edge.com/pdf/ibm/360/princOps/A22-6821-0_360PrincOps.pdf
http://bitsavers.trailing-edge.com/pdf/ibm/360/princOps/A22-6821-7_360PrincOpsDec67.pdf

ASCII-8 had the same defined characters as ASCII-7, but remapped the
ranges relative to ASCII-7 (what we know as ASCII):

0..31 -> 0..31
32..63 -> 64..95
64..95 -> 160..191
96..127 -> 224..255

leaving gaps in between. This makes it incompatible with standard
ASCII. There doesn't seem to be any stated rationale for this
rather odd mapping. It doesn't represent more than 128 characters,
so I frankly don't see the point.

Such an encoding would be suitable for a conforming C implementation,
assuming you work around the changes for '^' and '!'. Like EBDIC
and ASCII, it keeps the decimal digits contiguous. Like ASCII,
but unlike EBCDIC, the lowercase letters are contiguous, as are
the uppercase letters (C doesn't require this). And like EBCDIC,
it would force C compiler to make plain char unsigned.

UTF-8 is compatible with ASCII. A UTF-8-like encoding could be made
compatible with ASCII-8, but it would have to use a less elegant
encoding, and it would probably lose some of UTF-8's nice properties.
And it would be incompatible with ASCII-7.
 
G

glen herrmannsfeldt

(snip, I wrote)
Yes, and I wonder why.
For those who don't want to download the PDFs:

ASCII-8 had the same defined characters as ASCII-7, but remapped the
ranges relative to ASCII-7 (what we know as ASCII):
0..31 -> 0..31
32..63 -> 64..95
64..95 -> 160..191
96..127 -> 224..255
leaving gaps in between. This makes it incompatible with standard
ASCII. There doesn't seem to be any stated rationale for this
rather odd mapping. It doesn't represent more than 128 characters,
so I frankly don't see the point.

I mostly don't see the point either. One thing, though. It is required
that 256 different code points map to 256 different card punch
combinations. It does seem like that could have been done
with ASCII-7, though.
Such an encoding would be suitable for a conforming C implementation,
assuming you work around the changes for '^' and '!'. Like EBDIC
and ASCII, it keeps the decimal digits contiguous. Like ASCII,
but unlike EBCDIC, the lowercase letters are contiguous, as are
the uppercase letters (C doesn't require this). And like EBCDIC,
it would force C compiler to make plain char unsigned.
UTF-8 is compatible with ASCII. A UTF-8-like encoding could be made
compatible with ASCII-8, but it would have to use a less elegant
encoding, and it would probably lose some of UTF-8's nice properties.
And it would be incompatible with ASCII-7.

It might have been that if IBM did get ASCII-8 standardized that
other byte-oriented machines would have followed it. At least IBM
might have believe that.

But okay, the properties of EBCDIC that S/360 was designed around:

From pretty early in the punched card days, the top rows were called
zones and the bottom rows digits. (The top two rows are commonly called
the 12 and 11 row, though they don't have markings on them like rows
zero through nine.) In BCDIC, the top row was the '+' character and
the next row '-' (but there was also another code for '-'). In the
pre-computer punched card days one could "overpunch" the sign by
punching it over one of the digit columns. Using the electromechanical
card sorter, it would be one additional pass to separate plus from
minus cards.

For EBCDIC characters in memory, the top (MSB) half of each byte is
called the zone, and the bottom (LSB) the digit. Note that in ASCII-7,
ASCII-8, and EBCDIC the low hex digit of characters '0' through '9'
corresponds to the digit value.

The S/360 (and successor) PACK instruction will take from 1 to 16
bytes of zone decimal (one digit per byte) and convert to packed
decimal (two BCD digits per byte, with the sign in the least
significant half of the rightmost (least significant) byte.

For a series of EBCDIC digits, the result is BCD digits with a sign
field of X'F'. Conveniently, X'F' counts as positive for the packed
decimal (BCD) instructions. When the ASCII mode bit is not set in
the PSW, decimal instructions generate X'C' for plus and X'D'
for minus. When unpacked with the UNPK instruction, positive values
with the rightmost digit 1 through 9 will convert to the EBCDIC
codes for 'A' through 'J' (that is, X'C1 through X'C9') and punch
as 12 punch plus digit 1 though 9. X'C0' is not a printable
EBCDIC character, but will punch as 12 and 0. Similarly, for
negative values, the low byte will be between X'D0' and X'D9',
and punch as 11 row plus digit 0 though 9, again with a non-printing
character for 0, and C'J' through C'R' for 1 through 9.

When the ASCII bit is set in the PSW, decimal instruction generate
for the sign X'A' for plus and X'B' for minus. PACK will then
convert the low digit of positive numbers to bytes from
X'A0' through X'A9' and for negative values X'B0' through X'B9'.
Positive values convert to C'@' and C'A' through C'I' in ASCII-8,
and negative C'P' through C'Y'. With the appropriate punch code
for C'@' that works for positive numbers, but not negative
numbers. But maybe they could convince people to punch negative
numbers using C'P' through C'Y'.

In any case, it is not hard to fixup the low byte using instructions
such as OI, NI, or XI (or immediate, and immediate, xor immediate)
which OR, AND, or XOR one byte with an immediate value. Also,
one can use the TR (translate) instruction to convert between 1
and 256 characters using a 256 byte (or less) translate table.

Independent of the ASCII bit, decimal instructions accept X'B'
and X'D' as negative, X'A', X'C', X'E', and X'F' as positive.
Other sign values will generate an interrupt, as will digits
other than X'0' through X'9' in digit positions.

I don't know if that helps much. Presumably IBM could have built
card readers and card punches with either code.

-- glen
 
K

Keith Thompson

glen herrmannsfeldt said:
It might have been that if IBM did get ASCII-8 standardized that
other byte-oriented machines would have followed it. At least IBM
might have believe that.

I boldly predict that ASCII-8 won't catch on.

[big snip]
 
G

glen herrmannsfeldt

I boldly predict that ASCII-8 won't catch on.

But would you have predicted that 50 years ago.

We are now, more or less, 50 years from S/360.

The announcement was, according to Wikipedia, April 1964, but they
had to have been working on them earlier, including decisions like what
code to use.

-- glen
 
G

glen herrmannsfeldt

(snip, I wrote)
The IBM description takes the bits of ASCII-7, numbered 7654321 and
places them in the eight bit byte as 76754321. In contrast, the bits
of EBCDIC are described as 01234567. (That is, big endian order.)
I mostly don't see the point either. One thing, though. It is required
that 256 different code points map to 256 different card punch
combinations. It does seem like that could have been done
with ASCII-7, though.

There is good description of what went into the design of EBCDIC
in Blaauw & Brooks "Computer Architecture, Concepts and Evolution."
While many of the examples use IBM machines and decisions, they are
not afraid to note when a decision was a mistake.

For one, the designers of PL/I wanted to add &|~<>[] to the character
set. (As written in the book, it is ~, but likely supposed to be the
logical NOT character.) With the restrictions on printers, typewriters
(maybe including the 2741) and keypunches, two characters had to be
removed. The decision was to remove [], which Blaauw and Brooks say
was a mistake. PL/I, like Fortran, uses () for function references
and array subscripting. ASCII left out the NOT character, so that
in C we have [] for subscripts, but != and ! for relational and
logical operators.
It might have been that if IBM did get ASCII-8 standardized that
other byte-oriented machines would have followed it. At least IBM
might have believe that.

As described, for ASCII "(For reasons of commercial rivalry, the were
determined NOT to be compatible with BCD.)" (BCD was the name used to
describe what is now usually BCDIC, the predecessor to EBCDIC.)

When Fortran was developed, the printers of 704 could only
print 48 different characters, plus blank. The ='(+) characters
replaced five characters used for commercial computing, #@%&
(the fifth character doesn't exist on any system that I know of).

Anyway, I recommend the Blaauw and Brooks book for anyone at all
interested in the developments of computer architecture.

-- glen
 
C

Chris F.A. Johnson

There are enough exceptions that hard-coding those sequences can't be
explained as anything other than the programmer not knowing about termcap
or terminfo.

There are few enough exceptions that hard-coding is not a problem,
especially in shell scripts, where the interface to termcap or
terminfo is limited.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,076
Messages
2,570,565
Members
47,201
Latest member
IvyTeeter

Latest Threads

Top