print alphabet

  • Thread starter Bill Cunningham
  • Start date
B

Ben Bacarisse

Bartc said:
I thought Unicode used ASCII (character codes 32 thru 127 at least). (And
the 'sixbit' I once used was just ASCII 32 to 95, offset to start at
0.)

That does not mean that either of them is ASCII -- it just simplifies a
part of the transform to and from ASCII. The key point is that if you
stick to the common characters you fail to get the benefit of the larger
sets, and if you assume a particular encoding you get code that is even
more tied down by that assumption.
EBCDIC should be put painlessly to sleep... It was a terrible encoding. The
A-Z range is ubiquitous and deserves a concise, consecutive encoding.

Maybe, but it can't be done painlessly.
The whole of Unicode is a nightmare to deal with (for example, the exact
same glyph having dozens of character codes), and it's nice to have one sane
corner of it where things are kept simple.

You would probably not be saying that if you had been a Swedish C
programmer in the 80s or if Japan had been the dominant culture at the
time programming flourished!
[2] I know it's more than US and UK but I needed a shorthand. You can't
print the alphabet that Spanish children are taught by iterating from
ASCII 'a' to ASCII 'z'.

And the Italian alphabet had just 21 letters at one time. Now with foreign
words and www, everyone will recognise the same 26 letters we use in
English. (Although I understand in Spanish things like ñ and ll might be
letters in their own right.)

You seem to be supporting my case. There are both technical and
linguistic reasons why for (c = 'a'; c <= 'z'; c++) is not the right way
to iterate over "the alphabet". It iterates over *an* alphabet on
*some* systems.
 
B

BruceS

I don't see how the character set thing is all that arcane.  In particular,
I should point out:  It's fairly obvious that, for a couple billion real-world
use cases, the a..z range is NOT the whole set of lower case letters, and the
others aren't adjacent to them...

Do tell. IME, the a..z range is the whole set of lower case letters,
or is a superset of same. What character set has lower case letters
not contained in that range?
 
S

Seebs

Do tell. IME, the a..z range is the whole set of lower case letters,
or is a superset of same. What character set has lower case letters
not contained in that range?

Pretty much everything with accented letters. In the fairly common case
where a..z are adjacent (your bottom 127 characters are ASCII, for instance),
then obviously the other letters have to be outside that range.

So, say, I mostly end up in ISO 8859-1. 0xE0..0xFF are nearly all lowercase
letters which aren't between a and z (0x61..0x7A). In particular, 0xEF is
an i with a dieresis over it, which is part of the fairly common English
word naïve. (I have no idea whether that will come out.)

-s
 
D

Daniel Giaimo

Do tell. IME, the a..z range is the whole set of lower case letters,
or is a superset of same. What character set has lower case letters
not contained in that range?

So where in that range do you find slashed o, pi, aleph, or shcha?
 
B

BruceS

Pretty much everything with accented letters.  In the fairly common case
where a..z are adjacent (your bottom 127 characters are ASCII, for instance),
then obviously the other letters have to be outside that range.

So, say, I mostly end up in ISO 8859-1.  0xE0..0xFF are nearly all lowercase
letters which aren't between a and z (0x61..0x7A).  In particular, 0xEF is
an i with a dieresis over it, which is part of the fairly common English
word naïve.  (I have no idea whether that will come out.)

That makes sense, thanks. I haven't had to use a character set like
that, but can see where others do. As long as we can safely use
isalpha() & friends, I guess it really doesn't matter if the letters
are predictably in a particular range, except for programs like the
one that started this thread.
 
B

Ben Bacarisse

BartC said:
'A' to 'Z' is 65 to 90 in ASCII, and 65 to 90 in Unicode. That's all that
counts.

Then I have nothing to add.
Perhaps not. But I wouldn't have liked to have designed raster and vector
fonts (and I've done a bit of that) for some 60,000-odd characters..

Eh? You design fonts that have the glyphs that matter to you. The
Japanese have their own font designers. The fonts get made by people
who want them but they can't be used by programs if they are written
with the view that only codes 65 to 90 count.

A to Z is not just 'an' alphabet. It's one of the most significant ones, and
the iteration would likely work on *most* systems.

I won't disagree with that. It's a matter of emphasis.
It's also, of course, the alphabet that C uses for source code...

It would be daft for C to exclude this common alphabet. However:

6.4.2.1 p2: "An identifier is a sequence of nondigit characters
(including the underscore _, the lowercase and uppercase Latin
letters, and other characters)"

The C basic character set has neither @ not $ but I'll lay odd you write
program that manipulate these. The restricted nature of C's basic
character set is not a good argument against using a richer repertoire of
characters in the data the C programs manipulate.
 
S

Shao Miller

Colonel said:
There is an older verion, 3.3, that is free and more than adequate.
http://www.forteinc.com/agent/download-all.php

A million times better than any Web interface certainly.
After a initial challenge getting authentication to work, I've been
using Mozilla Thunderbird since John's suggestion. If experiences with
it accumulate to demonstrate an insufficient means to post about C, I
shall keep your note about this Forte Agent version in mind. :) So thanks.
 
J

John Kelly

After a initial challenge getting authentication to work, I've been
using Mozilla Thunderbird since John's suggestion. If experiences with
it accumulate to demonstrate an insufficient means to post about C, I
shall keep your note about this Forte Agent version in mind. :) So thanks.

I like to work smart, not hard. So I use Forte Agent. It's a Usenet
power tool.

$29 for the paid version is a drop in the bucket. My time is valuable
too.
 
R

Richard Bos

Tom St Denis said:
Things I've never personally witnessed:

1. Easter Bunny
2. Peace in my time
3. A machine that uses EBCDIC.

Things I've never personally witnessed:

4. Tom St Denis's genitals.

And yet, I do not doubt that those exist.

And yet, quite like 3., I don't doubt that they continue to exist _and_
be useful to other people, much though I would not want to witness them
even for ready money (let alone for cucumber sandwiches).

Richard
 
R

Richard Bos

Malcolm McLean said:
ASCII has been a great success.

Of course non-English speakers often want special characters. The
problem is there are too many of them (characters, not non-English
speakers).

For a lot of applications, you need to be able to knock up a font
quickly. 256 character set fonts are doable by a single person in
areasonable amount of time, and take only a reasonable amount of
memory for the table.

Not only are you wrong about non-English alphabets (so what Anglophone
bloody isn't, insular gits one and all), you are even wrong about your
precious 256-character ASCII.
ASCII is _not_ 256 characters. ASCII is 95 characters, plus 32-plus-1
control codes. Those 256 characters, in most character tables, already
do include things which are lower case letters but do not lie between
'a' and 'z'. Time to wake up: a-z was already wrong under MS-DOS. You
don't even need EBCDIC. Well, unless you want your salary paid on time,
that is - never underestimate the importance of the quite but reliable
mainframe in the background.

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top