Character Array vs String

D

David Thompson

sometimes two carriage returns to give the teletype the time it needs
to do this
Or other padding, as discussed nearby. A person punching tape would
find CR easiest to use, but automation mostly used NUL or DEL.

To answer another question nearby, real Teletypes did not have flow
control, although they had an (extra cost) option to stop and start
the *Teletype* reading paper tape, which could be used by a receiving
computer or other automated equipment to slow that input (only).
Some other terminals, especially some video terminals (next), did
extend these characters (DC1 aka XON, DC3 aka XOFF) for flow control
from the computer or similar sender, but because of roundtrip delay
this only works if the terminal has at least some buffering (about 2-4
characters) and Teletypes didn't. And some terminal modems of the day
were half-duplex (doesn't allow back traffic) or even if they were
nominally full-duplex didn't always reliably work that way.
yes but some terminals need \r\n some need \r\r\n and glass ttys don't
need both. The Unix decision to use a single logical end-of-line

Video terminals aka glass TTYs varied widely. Some used CR LF
separate, some had CR do LF, some had LF do CR, some had switch or
jumper or even PROM options. Some needed padding, some didn't -- some
early ones even needed *more* padding than mechanical TTYs!
character is the sane answer. The driver sorts out the display
problems.
Exactly.

I think I've seen \r used as a line terminator on DEC(?) machines.
I assume you mean storage, since most systems used lone CR input.

Apple MacOS (at least classic) stored CR (reverse from what Keith (no
relation!) said nearby). I don't know any DEC *software* that did;
most used CR LF and some used counts or reassigned meanings within a
6-bit code. Much DEC *hardware* ran Unix and stored \n=LF.

Kind of, except \n was and still canonically is ASCII LF not CR.
 
K

Keith Thompson

David Thompson said:
Apple MacOS (at least classic) stored CR (reverse from what Keith (no
relation!) said nearby).

Actually, I think that is what I said:

| But we're stuck with the current situation. MacOS managed to change
| its end-of-line marker from CR to LF, but I don't know if Windows
| could (or would) to do the same thing.

[...]
Kind of, except \n was and still canonically is ASCII LF not CR.

The C standard doesn't actually require \n to be LF (it doesn't even
mention ASCII outside a couple of footnotes). \n probably couldn't be
CR, since that's \r (presumably '\n' == '\r' would be non-conforming).
But in EBCDIC \n could plausibly be NL, which is distinct from both CR
and LF.
 
L

Lew Pitcher

In the case I am dimly remembering, data readiness was not the issue.
The interface could take more data and it would be printed wherever the
print head happened to be at the time.  I am sure this was regarded as
daft even then.

Yes, teletypes (I'm thinking of the KSR33 and ASR33 models, here)
would print exactly what you sent them, regardless of where the print
head was at the time.

It was customary to send a CR followed by a LF (often followed by one
or more padding characters like NUL) so that the print head had time
to return to the left margin /before/ it received and printed the
first printable character of the next line. From the right margin, the
carriage took more than 1/10'th of a second to return to the left
margin, and sometimes took more than 2/10'ths of a second. Thus, while
the carriage was traveling left, the LF character (and, if necessary,
the follow-on NUL padding characters) gave the carriage that time it
needed.

The LF character on it's own was quite immediate; it never took the
entire 1/10'th second to complete.
 
B

BartC

Kenneth Brody said:
On 11/21/2011 2:39 AM, David Thompson wrote:
[...]
But to answer the exact question, DEC PDP-10 computers had a 36-bit
word with (few) special 'byte' instructions that could access *any*
While there were a few PDP-n's on campus as I recall, the main computer
lab at the college I went to used a KL-10. While I hadn't heard of C way
back when, I did some assembly programming on it.

As you say, it had 36-bit words, and 18-bit addresses. The O/S used six
6-bit characters per word for filename,

'SIXBIT' format. Very handy, and probably also used for identifiers for
compilers (so you were limited to rather short names, but string matching
was very quick).
which IIRC were in 6.3 format. (I guess they used the other 18 bits for
file attributes?) Text was typically stored as five 7-bit characters,
with 1 bit padding.

Yet, although addresses were 18 bits wide, and each address pointed to a
36-bit word, a "pointer" [could], in fact, be wider than 18 bits. For

The address was 18-bits (for 256KWords of memory per task; just over 1MB!).
But there was also an 'indirect' bit which could be used for repeated
pointer operations, all automatic on any instruction. I believe another
4-bits was an extra index register for each level of pointers.
example, there were machine instructions for accessing sub-words,
including things like "take the 7 bits at this address/bit-location, move
it to the low-order 7 bits at this address, zero-padding it, and then
increment the pointer by 7 bits".

There were a dozen or so bits left over, which were used by byte-pointers,
but these needed special instructions to make use of.
(Of course, C requires at least 8 bits per char, so I guess it would use
the same thing, but storing four 8-bit chars per word with 4 padding bits.
You could store four 9-bit chars w/o any padding, but files on disk were
8-bit characters, so it probably wouldn't make sense to do so.)

As I said, I never saw a C compiler for it, but I would suspect that
sizeof() would be "interesting" on such a system. The simplest system
would be to simply store one 8-bit char per 36-bit word, but that seems
rather inefficient.

I'd imagine C with it's various kinds of pointers would be a nightmare to
implement, if packed char arrays were to be used. I remember using Pascal
which offered a choice of unpacked (fast) or packed (slow) arrays, records
and (presumably) strings.
 
K

Kaz Kylheku

Kenneth Brody said:
On 11/21/2011 2:39 AM, David Thompson wrote:
[...]
But to answer the exact question, DEC PDP-10 computers had a 36-bit
word with (few) special 'byte' instructions that could access *any*
While there were a few PDP-n's on campus as I recall, the main computer
lab at the college I went to used a KL-10. While I hadn't heard of C way
back when, I did some assembly programming on it.

As you say, it had 36-bit words, and 18-bit addresses. The O/S used six
6-bit characters per word for filename,

'SIXBIT' format. Very handy, and probably also used for identifiers for
compilers (so you were limited to rather short names, but string matching
was very quick).

At that time, symbol interning had already been invented by the Lispers,
by which any length identifier is reduced to a one-word symbol atom thereafter
used in its place.
 
B

Ben Pfaff

BartC said:
'SIXBIT' format. Very handy, and probably also used for identifiers
for compilers (so you were limited to rather short names, but string
matching was very quick).

This idea was reincarnated, badly, in the "AML" language used by
ACPI, which has 32-bit identifiers that are limited to 4 8-bit
characters.
 
J

John Bode

Could anybody please mention difference between character array and
string in C?

A string is a sequence of character values terminated by a 0. This
sequence of character values may be stored in an array of type char.
 
S

Stefan Ram

John Bode said:
A string is a sequence of character values terminated by a 0. This

"of char values". "char" is not "character", "int" is not "integer",
and so on, you got it.
sequence of character values may be stored in an array of type char.

(In a sense, everything can be stored in an array of type char,
as long as it is large enough.)
 
J

James Kuyper

"of char values". "char" is not "character", "int" is not "integer",
and so on, you got it.

Section 7.1.1p1 says "character" not "char"; it also says "null
character" rather than "0". Strings are allowed to contain multibyte
characters.
 
P

Patrick Scheible

BartC said:
Kenneth Brody said:
On 11/21/2011 2:39 AM, David Thompson wrote:
[...]
But to answer the exact question, DEC PDP-10 computers had a 36-bit
word with (few) special 'byte' instructions that could access *any*
While there were a few PDP-n's on campus as I recall, the main
computer lab at the college I went to used a KL-10. While I hadn't
heard of C way back when, I did some assembly programming on it.

As you say, it had 36-bit words, and 18-bit addresses. The O/S used
six 6-bit characters per word for filename,

'SIXBIT' format. Very handy, and probably also used for identifiers
for compilers (so you were limited to rather short names, but string
matching was very quick).
which IIRC were in 6.3 format. (I guess they used the other 18 bits
for file attributes?) Text was typically stored as five 7-bit
characters, with 1 bit padding.

Yet, although addresses were 18 bits wide, and each address pointed
to a 36-bit word, a "pointer" [could], in fact, be wider than 18
bits. For

The address was 18-bits (for 256KWords of memory per task; just over
1MB!). But there was also an 'indirect' bit which could be used for
repeated pointer operations, all automatic on any instruction. I
believe another 4-bits was an extra index register for each level of
pointers.
example, there were machine instructions for accessing sub-words,
including things like "take the 7 bits at this address/bit-location,
move it to the low-order 7 bits at this address, zero-padding it,
and then increment the pointer by 7 bits".

There were a dozen or so bits left over, which were used by
byte-pointers, but these needed special instructions to make use of.
(Of course, C requires at least 8 bits per char, so I guess it would
use the same thing, but storing four 8-bit chars per word with 4
padding bits. You could store four 9-bit chars w/o any padding, but
files on disk were 8-bit characters, so it probably wouldn't make
sense to do so.)

As I said, I never saw a C compiler for it, but I would suspect that
sizeof() would be "interesting" on such a system. The simplest
system would be to simply store one 8-bit char per 36-bit word, but
that seems rather inefficient.

I'd imagine C with it's various kinds of pointers would be a nightmare
to implement, if packed char arrays were to be used. I remember using
Pascal which offered a choice of unpacked (fast) or packed (slow)
arrays, records and (presumably) strings.

There has been at least one C compiler for the PDP-10. It uses four
9-bit chars per word.

The biggest model of PDP-10 supported extended addressing, with 23 bit
wide addresses. That is word addresses, multiply by 4.5 to get the
capacity in 8-bit bytes.

-- Patrick
 
U

Uno

On 11/21/2011 3:02 PM, Lew Pitcher wrote:
[...]
Yes, teletypes (I'm thinking of the KSR33 and ASR33 models, here)
would print exactly what you sent them, regardless of where the print
head was at the time.

It was customary to send a CR followed by a LF (often followed by one
or more padding characters like NUL) so that the print head had time
to return to the left margin /before/ it received and printed the
first printable character of the next line. From the right margin, the
carriage took more than 1/10'th of a second to return to the left
margin, and sometimes took more than 2/10'ths of a second. Thus, while
the carriage was traveling left, the LF character (and, if necessary,
the follow-on NUL padding characters) gave the carriage that time it
needed.
[...]

Yup. I saw systems which didn't add some sort of delay-after-CR, and the
printouts would include what was supposed to be the first character of
the next line somewhere in the middle of the line. (Sometimes between
lines, as the LF hadn't yet fully advanced the paper.)

Interesting. Would I be "wrong" to think that 'teletype' might usually
be abbreviated to 'tty'?
 
L

Lew Pitcher

On 11/21/2011 3:02 PM, Lew Pitcher wrote:
[...]
Yes, teletypes (I'm thinking of the KSR33 and ASR33 models, here)
would print exactly what you sent them, regardless of where the print
head was at the time.
[snip]
Interesting.  Would I be "wrong" to think that 'teletype' might usually
be abbreviated to 'tty'?

Not in the least. TeleType was the brand name, which was commonly
abbreviated to "TTY".
 
N

Nick Keighley

On 11/23/2011 9:42 AM, Kenneth Brody wrote:
On 11/21/2011 3:02 PM, Lew Pitcher wrote:
[...]
Yes, teletypes (I'm thinking of the KSR33 and ASR33 models, here)
would print exactly what you sent them, regardless of where the print
head was at the time.
[snip]
Interesting.  Would I be "wrong" to think that 'teletype' might usually
be abbreviated to 'tty'?

Not in the least. TeleType was the brand name, which was commonly
abbreviated to "TTY".

well Teletype was like "hoover"- it was used generically.
 
D

David Thompson

Kenneth Brody said:
On 11/21/2011 2:39 AM, David Thompson wrote:
[...]
But to answer the exact question, DEC PDP-10 computers had a 36-bit
word with (few) special 'byte' instructions that could access *any*
While there were a few PDP-n's on campus as I recall, the main computer
lab at the college I went to used a KL-10. While I hadn't heard of C way
back when, I did some assembly programming on it.
Not really -n. There were four implementations of PDP-10 by DEC (KA,
KI, KL, KS) and at least one competitive clone, plus PDP-6, with
basically the same instruction set. Other machines in the series were
(quite) different; PDP-5/8/12 and PDP-11 were the more widely used,
and unlike each other or PDP-6/10 or the other less-known PDP's.
'SIXBIT' format. Very handy, and probably also used for identifiers for
compilers (so you were limited to rather short names, but string matching
was very quick).
TOPS-10 (DEC's initial OS) was 6.3 filenames, with one directory per
user (basically). TENEX, developed by BBN and adopted (more or less)
by DEC as TOPS-20, had variable-length filenames (plus version
numbers), in hierarchical directories with variable-length names.
Yet, although addresses were 18 bits wide, and each address pointed to a
36-bit word, a "pointer" [could], in fact, be wider than 18 bits. For

The address was 18-bits (for 256KWords of memory per task; just over 1MB!).
But there was also an 'indirect' bit which could be used for repeated
pointer operations, all automatic on any instruction. I believe another
4-bits was an extra index register for each level of pointers.
1-bits address plus 1-bit indirect and 4-bit _optional_ index, yes.
Directly in instruction, and indirect pointer in 'memory' if used.

'memory' in quotes because the 16 registers (usually called ACs) can
be accessed as the first 16 locations in memory also.
There were a dozen or so bits left over, which were used by byte-pointers,
but these needed special instructions to make use of.
Exactly. Because byte-pointer needs those extra bits, _and_ because it
can be modified (incremented), it is only in 'memory'.

The special instructions are: load byte (any contiguous bits up to
wordsize) from memory to register, zero-padded, without or with
incrementing the pointer (but not to arbitrary location as BartC seems
to say); store register to byte in memory without or with incrementing
the pointer; increment or 'adjust' (multiple-increment) the pointer.
Files on disk were 36-bit words, and could have characters in any
format supported by software. 6x6 and 5x7 were most common in my
experience, although I think 4x8 and 9x8 were used for file exchange
with other systems (especially in ARPAnet and early Internet).
Not really. C requires that any data type (any object) be accessible
as an array of unsigned char (e.g. for memcpy) so 4x9 seems to be
the most reasonable choice. Although nowadays with Unicode finally
becoming really widespread, you could make an argument for 2x18
(and use the halfword instructions to optimize some cases).
I'd imagine C with it's various kinds of pointers would be a nightmare to
implement, if packed char arrays were to be used. I remember using Pascal
which offered a choice of unpacked (fast) or packed (slow) arrays, records
and (presumably) strings.

C's allowance of different pointer formats (size and representation)
to different target types works excellently here.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,083
Messages
2,570,591
Members
47,212
Latest member
RobynWiley

Latest Threads

Top