Typography of programs

A

Angel

WHY must be the huge majority of users that develop on workstations be
constrained to develop as if we all use an old teletype ?

Do tell, how much does it hurt to type && instead of whatever it is you
do to get the Unicode character? (I don't even know, and my keyboard is
not telling me.)
 
B

Ben Pfaff

jacob navia said:
Unicode now offers all possible signs for displaying in our programs,
and it would be a progress if C would standardize some codes to be
used isnstead of the usual != and &&, etc. [...]
We could have in some isoXXX.h
#define ≠ !=
#define â‹€ and
#define â‹ or
#define ≤ <=
#define ≥ >=

Why not just set up a syntax-coloring-like feature in your C
programmer's editor to show C syntax whichever way you prefer?
Then you'd get to look at the symbols you like, everyone else
could look at the symbols that they like, and there would be no
compatibility issues at all.
 
N

Nobody

We could use a small subset of Unicode

C already uses a subset of Unicode. A very small subset; it doesn't even
require the full (US) ASCII repertoire, in order to support the various
national ISO-646 encodings. And C99 went even further and added digraphs,
which presumably means that the inconvenience of trigraphs actually
mattered to some people.
 
J

jacob navia

Le 30/06/11 18:11, Ben Pfaff a écrit :
jacob navia said:
Unicode now offers all possible signs for displaying in our programs,
and it would be a progress if C would standardize some codes to be
used isnstead of the usual != and&&, etc. [...]
We could have in some isoXXX.h
#define ≠ !=
#define â‹€ and
#define â‹ or
#define ≤<=
#define ≥>=

Why not just set up a syntax-coloring-like feature in your C
programmer's editor to show C syntax whichever way you prefer?
Then you'd get to look at the symbols you like, everyone else
could look at the symbols that they like, and there would be no
compatibility issues at all.

Of course, but then, why can't we standardize it so that
programs remain compatible, IDEs inter operate, and the
stuff generalizes?
 
R

Ralf Damaschke

jacob navia said:
Then do you use ??/ instead of { ??????

Because "??/" is more portable than "{" since some 3270
terminals don't support "{" in some models.

Well, I would rather use "Hello world??/n" than "Hello world{n".

YMMV -- Ralf
 
I

Ian Collins

Le 30/06/11 18:11, Ben Pfaff a écrit :
jacob navia said:
Unicode now offers all possible signs for displaying in our programs,
and it would be a progress if C would standardize some codes to be
used isnstead of the usual != and&&, etc. [...]
We could have in some isoXXX.h
#define ≠ !=

I don't know about others, but I find that symbol hard to differentiate
from '='.
Of course, but then, why can't we standardize it so that
programs remain compatible, IDEs inter operate, and the
stuff generalizes?

If you put 10 programmers in a room with the same IDE, you'll end up
with 10 different editor set-ups.
 
B

Ben Pfaff

jacob navia said:
Le 30/06/11 18:11, Ben Pfaff a écrit :
jacob navia said:
Unicode now offers all possible signs for displaying in our programs,
and it would be a progress if C would standardize some codes to be
used isnstead of the usual != and&&, etc. [...]
We could have in some isoXXX.h
#define ≠ !=
#define â‹€ and
#define â‹ or
#define ≤<=
#define ≥>=

Why not just set up a syntax-coloring-like feature in your C
programmer's editor to show C syntax whichever way you prefer?
Then you'd get to look at the symbols you like, everyone else
could look at the symbols that they like, and there would be no
compatibility issues at all.

Of course, but then, why can't we standardize it so that
programs remain compatible, IDEs inter operate, and the
stuff generalizes?

Even if you get your proposed feature in standard C in 2011, it
will be 20 years before people can rely on it. (It's been 12
years since C99 arrived and the features that it added are still
not ubiquitous.)
 
S

Seebs

Any program that treats char as 8 bits instead of 7 should handle UTF-8.

Really? Because I often use a terminal which uses 8-bit characters,
specifically ISO 8859-1, and I am pretty sure it does not handle UTF-8.

-s
 
M

Michael Press

Chris H said:
In message <[email protected]


No not for that reason.

Simply for the reason that Usenet is an ASCII Text system.
HTML, XML, Unicode, attachments etc should not be used.

Have a look at RFC 3977. It documents how nntp is to work.
Here is a snippet from the section on internationalization.


Feather Standards Track [Page 100]
RFC 3977 Network News Transfer Protocol (NNTP) October 2006

[...]

10.2. This Specification

Part of the role of this present specification is to eliminate this
confusion and promote interoperability as far as possible. At the
same time, it is necessary to accept the existence of the present
situation and not break existing implementations and arrangements
gratuitously, even if they are less than optimal. Therefore, the
current practice described above has been taken into consideration in
producing this specification.

This specification extends NNTP from US-ASCII [ANSI1986] to UTF-8
[RFC3629]. Except in the two areas discussed below, UTF-8 (which is
a superset of US-ASCII) is mandatory, and implementations MUST NOT
use any other encoding.

Firstly, the use of MIME for article headers and bodies is strongly
recommended. However, given widely divergent existing practices, an
attempt to require a particular encoding and tagging standard would
be premature at this time. Accordingly, this specification allows
the use of arbitrary 8-bit data in articles subject to the following
requirements and recommendations.

o The names of headers (e.g., "From" or "Subject") MUST be in
US-ASCII.

o Header values SHOULD use US-ASCII or an encoding based on it, such
as RFC 2047 [RFC2047], until such time as another approach has
been standardised. At present, 8-bit encodings (including UTF-8)
SHOULD NOT be used because they are likely to cause
interoperability problems.

o The character set of article bodies SHOULD be indicated in the
article headers, and this SHOULD be done in accordance with MIME.

o Where an article is obtained from an external source, an
implementation MAY pass it on and derive data from it (such as the



Feather Standards Track [Page 101]
RFC 3977 Network News Transfer Protocol (NNTP) October 2006


response to the HDR command), even though the article or the data
does not meet the above requirements. Implementations MUST
transfer such articles and data correctly and unchanged; they MUST
NOT attempt to convert or re-encode the article or derived data.
(Nevertheless, a client or server MAY elect not to post or forward
the article if, after further examination of the article, it deems
it inappropriate to do so.)
 
H

Herbert Rosenau

Le 30/06/11 06:59, Ian Collins a écrit :

If isoXXX.h exists then it is 100% portable. You would just need to
#include that file or pass the code through a small preprocessor
and that's all.



It is not very difficult to adapt those tools to accept unicode
anyway... We could use a small subset of Unicode

There are more than native US-ASCII-Keyboards around the world. They
are have more or less national keysmappings on them, so it is a pain
to write US-ASCII when you have ,.-öä#+ü0!"§$%&/(()=?Ü'ÄÖ_:;<>

It is much easier to typ <= uinstad of some abstract numpad keycodes
to produce something that is not a plain US-ASCII code predefined on
an key.


--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 2.0 ist da!
 
J

jacob navia

Le 01/07/11 10:52, Herbert Rosenau a écrit :
There are more than native US-ASCII-Keyboards around the world. They
are have more or less national keysmappings on them, so it is a pain
to write US-ASCII when you have ,.-öä#+ü0!"§$%&/(()=?Ü'ÄÖ_:;<>

It is much easier to typ<= uinstad of some abstract numpad keycodes
to produce something that is not a plain US-ASCII code predefined on
an key.

Exactly. That is why my proposal was to TYPE <= but to STORE an
Unicode type. Digraphs are like that: you type 2 characters to
get a single one.
 
A

Angel

Le 01/07/11 10:52, Herbert Rosenau a ?crit :

Exactly. That is why my proposal was to TYPE <= but to STORE an
Unicode type. Digraphs are like that: you type 2 characters to
get a single one.

That would require not only a change in the C language, but also in
every editor in the world used to write C source. Better get
cracking... :)
 
J

jacob navia

Le 01/07/11 11:27, Angel a écrit :
That would require not only a change in the C language, but also in
every editor in the world used to write C source. Better get
cracking... :)

Sure, it would require an editor able to support utf8, what most
editors do since years.

Under vi, for instance, the characters display correctly in
my mac. Copying it to a linux distribution (Ubuntu), vi
displays it correctly.

Of course YOUR editor doesn't do it... or does it?

Have you even TESTED that assertion?

What editor are you using?
 
A

Angel

Le 01/07/11 11:27, Angel a ?crit :

Sure, it would require an editor able to support utf8, what most
editors do since years.
Under vi, for instance, the characters display correctly in
my mac. Copying it to a linux distribution (Ubuntu), vi
displays it correctly.

"vi" == "every editor in the world"?
Of course YOUR editor doesn't do it... or does it?

Have you even TESTED that assertion?

I'm not making any assertions. You are.
 
W

Wolfgang.Draxinger

Another way to generate non-portable code! Not much fun for those of
us who use command line text processing tools in the C locale!

Just specify that the source code must be in one specific encoding and
be done with. Just take a look at the Go programming language, which
permits Unicode characters in identifiers. Go specifies that all source
text is UTF-8. Problem solved.


Wolfgang
 
B

Ben Pfaff

Wolfgang.Draxinger said:
Just specify that the source code must be in one specific encoding and
be done with. Just take a look at the Go programming language, which
permits Unicode characters in identifiers. Go specifies that all source
text is UTF-8. Problem solved.

The Go language doesn't have legacy source code running on
diverse systems dating back to the 1970s.
 
B

Ben Pfaff

1. Who said whatever that character is after #define (that looks
like a question mark) is usable in an identifier? I don't understand
why you would want to use it as (part of) an identifier *and* as a
comparison operator?

They are not usable in identifiers. The normative Annex D to C99
lists all of the Unicode characters that are valid in
identifiers. None of these characters are in the list.

Presumably Jacob is proposing an extension.
 
J

jacob navia

Le 02/07/11 06:16, Gordon Burditt a écrit :
How many people can visually distinguish those possible signs from
each other?


Why? What problem are you attempting to solve?

At the start of the discussion, I was trying to get a modern
and aesthetically pleasing typography for programs. Why write
!= instead of the inequality sign from mathematics? There is
no reason.

But during the discussion, I have learned a bit, and now the
proposal should be extended:

o We should be able to format commentary text as in a simple
text editor (rtf) that would allow us to write diagrams that
explain the code instead of ridiculous Ascii drawings that
are extremely difficult to draw and do not pass through
reformatters

o We should end the digraphs of C and use the inequality sign
(in output) instead of !=, the assignment arrow instead of =,
the boolen \/ and /\ for OR and AND instead of || and &&,
etc.

The problem with documentation now, is that it must be written
outside the program, using a text editor that is in general
different from the programmer's editor. Obviously, as Ben said, I
could do thsi in wedit and be done with it, but actually that would be
a wrong solution since any user of this feature would be tied
to a single editor: impossible to use another since there
would be no standard way of editing programs.
Incidentally,
Unicode is well-known for the use of nearly-identical graphics
characters to try to get people and web browsers to interpret URLs
differently, and to exploit security holes.

Did you know that pointers are used in all exploits done so far?
Obviously we should ban pointers (and pointer+integer operations).
It seems that the *only* standardized way to refer to a character
is some wierd construct using a U and a bunch of numbers.


1. Who said whatever that character is after #define (that looks
like a question mark) is usable in an identifier?

Nobody said that. The problem with your news reader (tin) is
that is from february 2007 and kind of outdated, as you can
see now. Please get a newer version or a version that can
read/write utf8.


I don't understand
why you would want to use it as (part of) an identifier *and* as a
comparison operator?

I never said that. Please read again. I am proposing that != is replaced
with a single Unicode character (the inequality sign of mathematics)

2. How do you manage to #define the same character to five different
strings?

As I said before, your news reader displays only the alternative Unicode
sin (the question mark) since it is unable to display unicode.

3. How the heck do I type that character?


By pressing the "!" and then (after you have taken your finger away from
the "!") the "=" key. The sophisticated software behind your
programmer's editor translates those two characters into one, in the
same way that when you type an accentuated letter you type first
the accent, THEN the letter. OK?
Using something that looks like a *question mark* for assignment
would be extremely confusing.

Yes, I agree. But there was no question mark in my post. See above.

Using something that turns into a
*question mark* when posted to comp.lang.c would be even more
confusing.

Yes.


Why should we need a C-specific keyboard?

Because many C programmers do not need the "Paragraph" sign for instance.
It could be replaced with the... inequality sign!
The C language does not define a text editor.

It implicitly admits that the program text must be somehow
inputted into a computer system...

Besides, this proposal doesn't define an editor anyway. It would
propose an optional "enriched" form of programs that would replace
the digraphs of C with their Unicode equivalents and that would
reserve the sequences:

/*{\rtf
that should be followed by a matching
}*/

/*<HTML>

and
</HTML>*/
as the start of a rich text comment that should be displayed
as rich text if the text processor supports it.
You might be right about many editors supporting UTF-8. How many users
know how to use those features?

I would bet that there isn't a single programmer that has never used a
word processor. I am sorry but even you have used one.


And how many of those support UTF-8 on
a text console?

BSD is not very advanced in graphics and Unicode support maybe...
Get a Mac or a PC and use your favorite editor there, using BSD only to
run your code.
 
D

Dr Nick

jacob navia said:
Le 02/07/11 06:16, Gordon Burditt a écrit :

At the start of the discussion, I was trying to get a modern
and aesthetically pleasing typography for programs. Why write
!= instead of the inequality sign from mathematics? There is
no reason.

But during the discussion, I have learned a bit, and now the
proposal should be extended:

o We should be able to format commentary text as in a simple
text editor (rtf) that would allow us to write diagrams that
explain the code instead of ridiculous Ascii drawings that
are extremely difficult to draw and do not pass through
reformatters

That sounds almost orthogonal to language specification. This is going
to break every existing toolset, so unless you want to create the whole
editing environment from the ground up and require people to use it,
it's not going to work.
o We should end the digraphs of C and use the inequality sign
(in output) instead of !=, the assignment arrow instead of =,
the boolen \/ and /\ for OR and AND instead of || and &&,
etc.

The problem with documentation now, is that it must be written
outside the program, using a text editor that is in general
different from the programmer's editor. Obviously, as Ben said, I
could do thsi in wedit and be done with it, but actually that would be
a wrong solution since any user of this feature would be tied
to a single editor: impossible to use another since there
would be no standard way of editing programs.

But how on earth am I, editing your C in Emacs on my Linux box going to
see the same diagrams that you, editing your C in whatever on your Mac,
have created?

[big snip]
BSD is not very advanced in graphics and Unicode support maybe...
Get a Mac or a PC and use your favorite editor there, using BSD only
to run your code.

My favourite editor is Emacs! And unlike the Windows (presumably by PC
you mean "Windows machine" (PC meant personal computer many years before
Windows appeared) users I have no difficulty writing ≥ and its friends
when I need to.

Your proposals to use better symbols in your programming language are
perfectly good. But C is the wrong language for it. Like it or not, C
has to be in many respects "lowest common denominator" language just
because of its history.

Now Java is something that could easily have adopted these symbols. I
wonder why it didn't - finding out might help think about whether C
should.
 
J

jacob navia

Le 02/07/11 11:22, Dr Nick a écrit :
That sounds almost orthogonal to language specification. This is going
to break every existing toolset, so unless you want to create the whole
editing environment from the ground up and require people to use it,
it's not going to work.

I did not know that your editor can't display plain text.

If I write NOW:

/*<html Some comment in html format </html>*/

your editor goes nuts?
But how on earth am I, editing your C in Emacs on my Linux box going to
see the same diagrams that you, editing your C in whatever on your Mac,
have created?

It doesn't surprise you that when you write a web page
it looks the same in MANY web browsers? This is the same!

And if you do not want to use this feature you just don't,
since it is in commentaries, it can be safely ignored by older software
and you can stick to your old version of emacs.

There are versions of emacs for X windows, and they will display
html if they are updated to do so.
[big snip]
BSD is not very advanced in graphics and Unicode support maybe...
Get a Mac or a PC and use your favorite editor there, using BSD only
to run your code.

My favourite editor is Emacs! And unlike the Windows (presumably by PC
you mean "Windows machine" (PC meant personal computer many years before
Windows appeared) users I have no difficulty writing ≥ and its friends
when I need to.

Then what's the problem?
Your proposals to use better symbols in your programming language are
perfectly good. But C is the wrong language for it. Like it or not, C
has to be in many respects "lowest common denominator" language just
because of its history.

My proposals do not change C at all. They add an optional standard way
of writing programs using modern typesetting techniques like:

o bold face
o italics
o embedded graphics
o extended character sets.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,093
Messages
2,570,607
Members
47,227
Latest member
bluerose1

Latest Threads

Top