unicode as valid naming symbols

Antoon Pardon · Mar 27, 2014

If it's going to become an operator, then it has to be a keyword.
Changing a token that is currently allowed to be an identifier into a
keyword is generally avoided as much as possible, because it breaks
backward compatibility. "in" and "is" have both been keywords for a
very long time, perhaps since the initial release of Python.

I know, for such a reason I would love it if keywords would have been
written like this: ð—±ð—²ð—³ (using mathematical bold) instead of just like
this: def (using plain latin letters). It would mean among other things
we could just write operator.not instead of having to write operator.not_

Rustom Mody · Mar 27, 2014

I know, for such a reason I would love it if keywords would have been
written like this: ð—±ð—²ð—³ (using mathematical bold) instead of just like
this: def (using plain latin letters). It would mean among other things
we could just write operator.not instead of having to write operator.not_

Just out of curiosity how do/did you type that?
When I see an exotic denizen from the unicode-universe I paste it into
emacs and ask "Who are you?"

But with your 'def' my emacs is going a bit crazy!

Mark H Harris · Mar 27, 2014

To quote a great Spaniard:

â€œYou keep using that word, I do not think it means what you
think it means.â€

In~con~theveable ! My name is Inigo Montoya, you killed my
father, prepare to die...

Do you think that the ability to write this would be an improvement?

import âŒº
âŒš = âŒº.â•©â–‘
â‘¥ = 5*âŒº.â‹¨â‹©
â¹ = â‘¥ - 1
â™…âš•âš› = [âŒº.âœ±âœ³**âŒº.â‡*â¹{â ª|âŒš.âˆ£} for â ª in âŒº.â£š]
âŒº.Ë˜ËœÂ¨Â´Õ›Õœ(â™…âš•âš›)

Steven, you're killing me here; argument by analogy does not work!

âˆš = lambda n: sqrt(n) <===== but this should work...

In point of fact, it should be built-in ! OK, IMHO.

Of course, it's not even necessary to be that exotic. "Any unicode symbol
that is not a number"... that means things like these:

No, any unicode character (except numerals) should be able to begin a
name identifier. alt-l Î» and alt-v âˆš should be valid first
character name identifier symbols.

There are languages that can allow arbitrary symbols as identifiers, like
Lisp and Forth. You will note that they have a certain reputation for
being, um, different, and although both went through periods of
considerable popularity, both have faded in popularity since.

Actually, there is a recent resurgence of popularity in both common
lisp and scheme these days. But, again, that has nothing to do with my
argument. No modern language should limit the use of certain symbols to
say, only math âˆš . The radical symbol is more often than not going
to be useful only with math (which , by the way is why it should be
built-in as âˆš = squre-rooot) but why limit its use elsewhere.

Whether this can work in python is also beside the point, because
I'm not demanding anything here either, at this point.

have a good day!

marcus

Tim Chase · Mar 27, 2014

Just out of curiosity how do/did you type that?
When I see an exotic denizen from the unicode-universe I paste it
into emacs and ask "Who are you?"

But with your 'def' my emacs is going a bit crazy!

I have the following in a file, which I can then open with Vim. I
just type the text I want above the corresponding row of capital
letters and execute the statement on the first line. Vim then
populates the system clipboard with the corresponding letters in the
Unicode font. I'm sure there are font-translator pages that would
make it a little easier, but I'd already done this. Just adjust for
Emacs ;-)

-tkc

let @+=join(map(split(getline('.'), '\zs'),
'matchstr(getline(line(".")+((v:val >= "A" && v:val <= "Z")?1:2)),
".\\{".(char2nr(tolower(v:val))-char2nr("a"))."}\\zs.")'),'')
ABCDEFGHIJKLMNOPQRSTUVWXYZ ASCII upper
abcdefghijklmnopqrstuvwxyz ASCII lower
ð€ðð‚ðƒð„ð…ð†ð‡ðˆð‰ðŠð‹ðŒððŽððð‘ð’ð“ð”ð•ð–ð—ð˜ð™ bold serif upper
ðšð›ðœððžðŸð ð¡ð¢ð£ð¤ð¥ð¦ð§ð¨ð©ðªð«ð¬ðð®ð¯ð°ð±ð²ð³ bold serif lower
ð´ðµð¶ð·ð¸ð¹ðºð»ð¼ð½ð¾ð¿ð‘€ð‘ð‘‚ð‘ƒð‘„ð‘…ð‘†ð‘‡ð‘ˆð‘‰ð‘Šð‘‹ð‘Œð‘ italic serif upper
ð‘Žð‘ð‘ð‘‘ð‘’ð‘“ð‘”ð‘•ð‘–ð‘—ð‘˜ð‘™ð‘šð‘›ð‘œð‘ð‘žð‘Ÿð‘ ð‘¡ð‘¢ð‘£ð‘¤ð‘¥ð‘¦ð‘§ italic serif lower
ð‘¨ð‘©ð‘ªð‘«ð‘¬ð‘ð‘®ð‘¯ð‘°ð‘±ð‘²ð‘³ð‘´ð‘µð‘¶ð‘·ð‘¸ð‘¹ð‘ºð‘»ð‘¼ð‘½ð‘¾ð‘¿ð’€ð’ bold italic serif upper
ð’‚ð’ƒð’„ð’…ð’†ð’‡ð’ˆð’‰ð’Šð’‹ð’Œð’ð’Žð’ð’ð’‘ð’’ð’“ð’”ð’•ð’–ð’—ð’˜ð’™ð’šð’› bold italic serif lower
ð“ð“‘ð“’ð““ð“”ð“•ð“–ð“—ð“˜ð“™ð“šð“›ð“œð“ð“žð“Ÿð“ ð“¡ð“¢ð“£ð“¤ð“¥ð“¦ð“§ð“¨ð“© script upper
ð“ªð“«ð“¬ð“ð“®ð“¯ð“°ð“±ð“²ð“³ð“´ð“µð“¶ð“·ð“¸ð“¹ð“ºð“»ð“¼ð“½ð“¾ð“¿ð”€ð”ð”‚ð”ƒ script lower
ð”„ð”…ð”†ð”‡ð”ˆð”‰ð”Šð”‹ð”Œð”ð”Žð”ð”ð”‘ð”’ð”“ð””ð”•ð”–ð”—ð”˜ð”™ð”šð”›ð”œð” fraktur upper
ð”žð”Ÿð” ð”¡ð”¢ð”£ð”¤ð”¥ð”¦ð”§ð”¨ð”©ð”ªð”«ð”¬ð”ð”®ð”¯ð”°ð”±ð”²ð”³ð”´ð”µð”¶ð”· fraktur lower
ð•¬ð•ð•®ð•¯ð•°ð•±ð•²ð•³ð•´ð•µð•¶ð•·ð•¸ð•¹ð•ºð•»ð•¼ð•½ð•¾ð•¿ð–€ð–ð–‚ð–ƒð–„ð–… fraktur bold upper
ð–†ð–‡ð–ˆð–‰ð–Šð–‹ð–Œð–ð–Žð–ð–ð–‘ð–’ð–“ð–”ð–•ð––ð–—ð–˜ð–™ð–šð–›ð–œð–ð–žð–Ÿ fraktur bold lower
ð”¸ð”¹ð”ºð”»ð”¼ð”½ð”¾ð”¿ð•€ð•ð•‚ð•ƒð•„ð•…ð•†ð•‡ð•ˆð•‰ð•Šð•‹ð•Œð•ð•Žð•ð•ð•‘ hollow upper
ð•’ð•“ð•”ð••ð•–ð•—ð•˜ð•™ð•šð•›ð•œð•ð•žð•Ÿð• ð•¡ð•¢ð•£ð•¤ð•¥ð•¦ð•§ð•¨ð•©ð•ªð•« hollow lower
ð– ð–¡ð–¢ð–£ð–¤ð–¥ð–¦ð–§ð–¨ð–©ð–ªð–«ð–¬ð–ð–®ð–¯ð–°ð–±ð–²ð–³ð–´ð–µð–¶ð–·ð–¸ð–¹ sans upper
ð–ºð–»ð–¼ð–½ð–¾ð–¿ð—€ð—ð—‚ð—ƒð—„ð—…ð—†ð—‡ð—ˆð—‰ð—Šð—‹ð—Œð—ð—Žð—ð—ð—‘ð—’ð—“ sans lower
ð—”ð—•ð—–ð——ð—˜ð—™ð—šð—›ð—œð—ð—žð—Ÿð— ð—¡ð—¢ð—£ð—¤ð—¥ð—¦ð—§ð—¨ð—©ð—ªð—«ð—¬ð— bold sans upper
ð—®ð—¯ð—°ð—±ð—²ð—³ð—´ð—µð—¶ð—·ð—¸ð—¹ð—ºð—»ð—¼ð—½ð—¾ð—¿ð˜€ð˜ð˜‚ð˜ƒð˜„ð˜…ð˜†ð˜‡ bold sans lower
ð˜ˆð˜‰ð˜Šð˜‹ð˜Œð˜ð˜Žð˜ð˜ð˜‘ð˜’ð˜“ð˜”ð˜•ð˜–ð˜—ð˜˜ð˜™ð˜šð˜›ð˜œð˜ð˜žð˜Ÿð˜ ð˜¡ italic sans upper
ð˜¢ð˜£ð˜¤ð˜¥ð˜¦ð˜§ð˜¨ð˜©ð˜ªð˜«ð˜¬ð˜ð˜®ð˜¯ð˜°ð˜±ð˜²ð˜³ð˜´ð˜µð˜¶ð˜·ð˜¸ð˜¹ð˜ºð˜» italic sans lower
ð˜¼ð˜½ð˜¾ð˜¿ð™€ð™ð™‚ð™ƒð™„ð™…ð™†ð™‡ð™ˆð™‰ð™Šð™‹ð™Œð™ð™Žð™ð™ð™‘ð™’ð™“ð™”ð™• bold italic sans upper
ð™–ð™—ð™˜ð™™ð™šð™›ð™œð™ð™žð™Ÿð™ ð™¡ð™¢ð™£ð™¤ð™¥ð™¦ð™§ð™¨ð™©ð™ªð™«ð™¬ð™ð™®ð™¯ bold italic sans lower
ð™°ð™±ð™²ð™³ð™´ð™µð™¶ð™·ð™¸ð™¹ð™ºð™»ð™¼ð™½ð™¾ð™¿ðš€ðšðš‚ðšƒðš„ðš…ðš†ðš‡ðšˆðš‰ mono upper
ðšŠðš‹ðšŒðšðšŽðšðšðš‘ðš’ðš“ðš”ðš•ðš–ðš—ðš˜ðš™ðššðš›ðšœðšðšžðšŸðš ðš¡ðš¢ðš£ mono lower
ðŸŽðŸðŸðŸ‘ðŸ’ðŸ“ðŸ”ðŸ•ðŸ–ðŸ— bold serif
ðŸ˜ðŸ™ðŸšðŸ›ðŸœðŸðŸžðŸŸðŸ ðŸ¡ hollow
ðŸ¢ðŸ£ðŸ¤ðŸ¥ðŸ¦ðŸ§ðŸ¨ðŸ©ðŸªðŸ« sans
ðŸ¬ðŸðŸ®ðŸ¯ðŸ°ðŸ±ðŸ²ðŸ³ðŸ´ðŸµ sans bold
ðŸ¶ðŸ·ðŸ¸ðŸ¹ðŸºðŸ»ðŸ¼ðŸ½ðŸ¾ðŸ¿ mono

Rustom Mody · Mar 27, 2014

On 3/25/14 6:58 PM, Steven D'Aprano wrote:

In~con~theveable ! My name is Inigo Montoya, you killed my
father, prepare to die...

Do you think that the ability to write this would be an improvement?
import âŒº
âŒš = âŒº.â•©â–‘
â‘¥ = 5*âŒº.â‹¨â‹©
â¹ = â‘¥ - 1
â™…âš•âš› = [âŒº.âœ±âœ³**âŒº.â‡*â¹{â ª|âŒš.âˆ£} for â ª in âŒº.â£š]
âŒº.Ë˜ËœÂ¨Â´Õ›Õœ(â™…âš•âš›)

Click to expand...

Steven, you're killing me here; argument by analogy does not work!

âˆš = lambda n: sqrt(n) <===== but this shouldwork...

In point of fact, it should be built-in ! OK, IMHO.

No, any unicode character (except numerals) should be able to begin a
name identifier. alt-l Î» and alt-v âˆš should be valid first
character name identifier symbols.

Actually, there is a recent resurgence of popularity in both common
lisp and scheme these days. But, again, that has nothing to do with my
argument. No modern language should limit the use of certain symbols to
say, only math âˆš . The radical symbol is more often than not going
to be useful only with math (which , by the way is why it should be
built-in as âˆš = squre-rooot) but why limit its use elsewhere.

Whether this can work in python is also beside the point, because
I'm not demanding anything here either, at this point.

have a good day!

The problem is that mathematicians invent notations in a completely
laissez-faire manner.

Language implementers having to unrestrainedly keep up would go mad.
And then us vanilla users (aka programmers) would have to deal with maddened
implementers.

Observe:
Good ol infix -- x+y..
prefix (with paren) -- foo(x)
prefix without -- Â¬ x
In case you thought alphanumerics had parens -- sin x
Then theres postfix -- n!
Inside fix -- nCr (Or if you prefer â¿Cáµ£ ??)
And outside fix -- mod -- |x|

And Ive probably forgotten 2 dozen other common ones

Mark H Harris · Mar 27, 2014

Observe:
Good ol infix -- x+y..
prefix (with paren) -- foo(x)
prefix without -- Â¬ x
In case you thought alphanumerics had parens -- sin x
Then theres postfix -- n!
Inside fix -- nCr (Or if you prefer â¿Cáµ£ ??)
And outside fix -- mod -- |x|

And Ive probably forgotten 2 dozen other common ones

Oh, I know... that's why I'm not demanding anything (what a head-ache).

Ian Kelly · Mar 27, 2014

Do you think that the ability to write this would be an improvement?

import âŒº
âŒš = âŒº.â•©â–‘
â‘¥ = 5*âŒº.â‹¨â‹©
â¹ = â‘¥ - 1
â™…âš•âš› = [âŒº.âœ±âœ³**âŒº..â‡*â¹{â ª|âŒš.âˆ£} for â ª in âŒº.â£š]
âŒº.Ë˜ËœÂ¨Â´Õ›Õœ(â™…âš•âš›)

Click to expand...

Steven, you're killing me here; argument by analogy does not work!

That's not an analogy. That's an example of valid Python code if
arbitrary Unicode characters could be used to name identifiers.

No, any unicode character (except numerals) should be able to begin a name
identifier. alt-l Î» and alt-v âˆš should be valid first character
name identifier symbols.

What's a numeral? The circled numbers in the example above are
categorized as No ("Number, Other"). Currently Python only allows the
ASCII digits in numeric literals, but who's to say that Ù¤Ù¢ --
categorized as Nd ("Number, Decimal Digit") shouldn't be a valid way
to write 42? ãŠ· seems a bit excessive for a literal, though, so should
that be permitted to start an identifier?

Actually, there is a recent resurgence of popularity in both common lisp
and scheme these days. But, again, that has nothing to do with my argument.
No modern language should limit the use of certain symbols to say, only math
âˆš . The radical symbol is more often than not going to be useful only
with math (which , by the way is why it should be built-in as âˆš =
squre-rooot) but why limit its use elsewhere.

Whether this can work in python is also beside the point, because I'm not
demanding anything here either, at this point.

One of the things that Python is widely known for is its readability.
Allowing symbols such as âˆš to denote identifiers may be quite
expressive and appreciable to the person writing the code. However it
damages readability considerably, as seen in Steven's example above.
Personally I'm not interested in having to maintain another
programmer's code that arbitrarily uses âŒš as a timer function, â•© as
intersection or â–‘ as a matrix constructor.

Chris Angelico · Mar 27, 2014

No, any unicode character (except numerals) should be able to begin a name
identifier. alt-l Î» and alt-v âˆš should be valid first character
name identifier symbols.

What, even whitespace??

ChrisA

MRAB · Mar 27, 2014

On 3/25/14 6:58 PM, Steven D'Aprano wrote:

Click to expand...

In~con~theveable ! My name is Inigo Montoya, you killed my
father, prepare to die...

Do you think that the ability to write this would be an improvement?
import âŒº
âŒš = âŒº.â•©â–‘
â‘¥ = 5*âŒº.â‹¨â‹©
â¹ = â‘¥ - 1
â™…âš•âš› = [âŒº.âœ±âœ³**âŒº.â‡*â¹{â ª|âŒš.âˆ£} for â ª in âŒº.â£š]
âŒº.Ë˜ËœÂ¨Â´Õ›Õœ(â™…âš•âš›)

Click to expand...

Click to expand...

Steven, you're killing me here; argument by analogy does not work!

Click to expand...

âˆš = lambda n: sqrt(n) <===== but this should work...

Click to expand...

In point of fact, it should be built-in ! OK, IMHO.

Click to expand...

No, any unicode character (except numerals) should be able to begin a
name identifier. alt-l Î» and alt-v âˆš should be valid first
character name identifier symbols.

Click to expand...

Actually, there is a recent resurgence of popularity in both common
lisp and scheme these days. But, again, that has nothing to do with my
argument. No modern language should limit the use of certain symbols to
say, only math âˆš . The radical symbol is more often than not going
to be useful only with math (which , by the way is why it should be
built-in as âˆš = squre-rooot) but why limit its use elsewhere.

Click to expand...

Whether this can work in python is also beside the point, because
I'm not demanding anything here either, at this point.

Click to expand...

have a good day!

Click to expand...

The problem is that mathematicians invent notations in a completely
laissez-faire manner.

Language implementers having to unrestrainedly keep up would go mad.
And then us vanilla users (aka programmers) would have to deal with maddened
implementers.

Observe:
Good ol infix -- x+y..
prefix (with paren) -- foo(x)
prefix without -- Â¬ x
In case you thought alphanumerics had parens -- sin x
Then theres postfix -- n!
Inside fix -- nCr (Or if you prefer â¿Cáµ£ ??)
And outside fix -- mod -- |x|

And Ive probably forgotten 2 dozen other common ones

You haven't mentioned implicit multiplication: xy

Then there's raising to a power sinÂ²(x), except that what looks like
raising to -1 actually means the inverse function (arcsin).

Rustom Mody · Mar 27, 2014

Do you think that the ability to write this would be an improvement?
import âŒº
âŒš = âŒº.â•©â–‘
â‘¥ = 5*âŒº.â‹¨â‹©
â¹ = â‘¥ - 1
â™…âš•âš› = [âŒº.âœ±âœ³**âŒº.â‡*â¹{â ª|âŒš.âˆ£} for â ª in âŒº.â£š]
âŒº.Ë˜ËœÂ¨Â´Õ›Õœ(â™…âš•âš›)

Click to expand...

Steven, you're killing me here; argument by analogy does not work!

Click to expand...

That's not an analogy. That's an example of valid Python code if
arbitrary Unicode characters could be used to name identifiers.

Python has other lexical categories than identifier-chars eg operators.
Enriching that set is a somewhat different direction from
enriching the identifier charset.

Note both these directions are valid bit different
This table http://www.unicode.org/charts/PDF/U2200.pdf
looks unpleasantly overfilled. However good deal is stylistic differences
â‰¥ vs â‰§ and sometimes even indistinguishable âˆˆ vs âˆŠ.

If we accept that python is more readable than Cobol, having a good
selection from the above makes for a programming language more readable in an
analogous manner.

Rustom Mody · Mar 27, 2014

On 3/25/14 6:58 PM, Steven D'Aprano wrote:
To quote a great Spaniard:
â€œYou keep using that word, I do not think it means whatyou
think it means.â€
In~con~theveable ! My name is Inigo Montoya, you killed my
father, prepare to die...
Do you think that the ability to write this would be an improvement?
import âŒº
âŒš = âŒº.â•©â–‘
â‘¥ = 5*âŒº.â‹¨â‹©
â¹ = â‘¥ - 1
â™…âš•âš› = [âŒº.âœ±âœ³**âŒº.â‡*â¹{â ª|âŒš.âˆ£} for â ª in âŒº.â£š]
âŒº.Ë˜ËœÂ¨Â´Õ›Õœ(â™…âš•âš›)
Steven, you're killing me here; argument by analogy does not work!
âˆš = lambda n: sqrt(n) <===== but this should work...
In point of fact, it should be built-in ! OK, IMHO.
Of course, it's not even necessary to be that exotic. "Any unicode symbol
that is not a number"... that means things like these:
No, any unicode character (except numerals) should be able to begina
name identifier. alt-l Î» and alt-v âˆš should be valid first
character name identifier symbols.
There are languages that can allow arbitrary symbols as identifiers,like
Lisp and Forth. You will note that they have a certain reputation for
being, um, different, and although both went through periods of
considerable popularity, both have faded in popularity since.
Actually, there is a recent resurgence of popularity in both common
lisp and scheme these days. But, again, that has nothing to do with my
argument. No modern language should limit the use of certain symbols to
say, only math âˆš . The radical symbol is more often than not going
to be useful only with math (which , by the way is why it should be
built-in as âˆš = squre-rooot) but why limit its use elsewhere.
Whether this can work in python is also beside the point, because
I'm not demanding anything here either, at this point.
have a good day!

Click to expand...

The problem is that mathematicians invent notations in a completely
laissez-faire manner.
Language implementers having to unrestrainedly keep up would go mad.
And then us vanilla users (aka programmers) would have to deal with maddened
implementers.
Observe:
Good ol infix -- x+y..
prefix (with paren) -- foo(x)
prefix without -- Â¬ x
In case you thought alphanumerics had parens -- sin x
Then theres postfix -- n!
Inside fix -- nCr (Or if you prefer â¿Cáµ£ ??)
And outside fix -- mod -- |x|
And Ive probably forgotten 2 dozen other common ones

Click to expand...

You haven't mentioned implicit multiplication: xy

Yeah -- thats a bad one!

Can mean
- ordinary multiply (if you are in school)
- overloaded (scalar-field or scalar-vector) multiply in linear algebra
- function application (tensors??)
- concatenation (awk, snobol (with space))
- a 2 char variable (for 'normal' (whatever that means) programmer)

Then there's raising to a power sinÂ²(x), except that what looks like
raising to -1 actually means the inverse function (arcsin).

Non-linear notations is another can (barrel?) of worms
Matrices/Determinants anyone?

Yeah... Copying the *notations* of mathematicians is not such a great idea.

And yet doing away with it too summarily leads to Cobol, Sql etc.
The math remains willy-nilly... just under a steaming pile of alphanumeriage.

Gregory Ewing · Mar 27, 2014

And mismatched delimiters:

[5, 7)

|x>

random832 · Mar 28, 2014

Just out of curiosity how do/did you type that?
When I see an exotic denizen from the unicode-universe I paste it into
emacs and ask "Who are you?"

But with your 'def' my emacs is going a bit crazy!

Your emacs probably is using UCS-2 or UTF-16. The former can't handle
characters above 65535 at all, the latter stores them as if they were
two characters [so code that's not expecting them will handle them
incorrectly]

Rustom Mody · Mar 29, 2014

Your emacs probably is using UCS-2 or UTF-16. The former can't handle
characters above 65535 at all, the latter stores them as if they were
two characters [so code that's not expecting them will handle them
incorrectly]

My current diagnosis (with the help of more knowledgeable folks than myself)
is that its a font problem.

There simply doesn't exist a font (or more likely I dont know of) that
- is readable
- is scaleable
- spans the whole 17*65536 unicode space

At least out here:
- gnu-unifont does not cover things outside BMP
- dejavu seems to have some bugs

Chris Angelico · Mar 29, 2014

My current diagnosis (with the help of more knowledgeable folks than myself)
is that its a font problem.

There simply doesn't exist a font (or more likely I dont know of) that
- is readable
- is scaleable
- spans the whole 17*65536 unicode space

For my MUDding, I use a font imaginatively named "Monospace", which
does most of what I want. There are a handful of characters that come
out as the "square with digits inside", but not huge blocks (certainly
not "everything non-BMP" or anything like that). It's fairly readable,
although I don't know about scaling - I run it at 14pt and nowhere
else. Comes with Debian.

ChrisA

Dennis Lee Bieber · Mar 29, 2014

There simply doesn't exist a font (or more likely I dont know of) that
- is readable
- is scaleable
- spans the whole 17*65536 unicode space

Considering that a 5x8 bitmap font (which is unlikely to even have
enough pixels to produce even 65536 unique glyphs) would take 5.6MB for
your (17*65536), I wouldn't want to see what an algorithmic description
would require.

Looking at some of my collection of fonts, TTF and some PS, seem to be
running around 100kB per font, and those fonts likely have around 128-192
glyphs.

For 1114112 glyphs (17*65536) at, say 164 glyphs pre 100kB gives 680MB
per FONT. Assume the standards: normal, bold, italic, bold-italic -- one is
now up to 2.7GB per typeface. 5.4GB to support just one serif and one sans
serif typeface.

Chris Angelico · Mar 29, 2014

Considering that a 5x8 bitmap font (which is unlikely to even have
enough pixels to produce even 65536 unique glyphs) would take 5.6MB for
your (17*65536), I wouldn't want to see what an algorithmic description
would require.

Looking at some of my collection of fonts, TTF and some PS, seem to be
running around 100kB per font, and those fonts likely have around 128-192
glyphs.

For 1114112 glyphs (17*65536) at, say 164 glyphs pre 100kB gives 680MB
per FONT. Assume the standards: normal, bold, italic, bold-italic -- one is
now up to 2.7GB per typeface. 5.4GB to support just one serif and one sans
serif typeface.

Most fonts these days are vector, not bitmap, but a 5x8 bitmap has
forty pixels, any of which can be either on or off - that gives
roughly twice as much data space as the 21-bit Unicode spec. Plenty of
room for 17*65536 unique glyphs. But you're right that it'd then take
~5-6MB to store that, minimum.

ChrisA

Gregory Ewing · Mar 30, 2014

Chris said:
a 5x8 bitmap has
forty pixels, any of which can be either on or off - that gives
roughly twice as much data space as the 21-bit Unicode spec.

We don't need a font, then -- just map the pixels
straight onto bits in the character code!

Might require some user re-education, but that's
a small price to pay for saving so much memory
space.

Antoon Pardon · Mar 31, 2014

Do you think that the ability to write this would be an improvement?

import âŒº
âŒš = âŒº.â•©â–‘
â‘¥ = 5*âŒº.â‹¨â‹©
â¹ = â‘¥ - 1
â™…âš•âš› = [âŒº.âœ±âœ³**âŒº.â‡*â¹{â ª|âŒš.âˆ£} for â ª in âŒº.â£š]
âŒº.Ë˜ËœÂ¨Â´Õ›Õœ(â™…âš•âš›)

Click to expand...

Steven, you're killing me here; argument by analogy does not work!

Click to expand...

[ ------ 8< ---------- ]
One of the things that Python is widely known for is its readability.
Allowing symbols such as âˆš to denote identifiers may be quite
expressive and appreciable to the person writing the code. However it
damages readability considerably, as seen in Steven's example above.
Personally I'm not interested in having to maintain another
programmer's code that arbitrarily uses âŒš as a timer function, â•© as
intersection or â–‘ as a matrix constructor.

I don't find Steven's example convincing. Sure it can be used in a way
that damages readability considerably however lots of things in python
can be abused in a way that damages readability considerably.

That you are not interested in having to maintain someone's code who
would use such symbols is irrelevant. IIRC people have used the exact
same kind of argument against decorators and the if-else operator.

It seems we are all consenting adults until someone doesn't like the
idea how it might influence his job. In that case it shouldn't be
allowed.

Ian Kelly · Mar 31, 2014

Do you think that the ability to write this would be an improvement?

import âŒº
âŒš = âŒº.â•©â–‘
â‘¥ = 5*âŒº.â‹¨â‹©
â¹ = â‘¥ - 1
â™…âš•âš› = [âŒº.âœ±âœ³**âŒº.â‡*â¹{â ª|âŒš.âˆ£} for â ª in âŒº.â£š]
âŒº.Ë˜ËœÂ¨Â´Õ›Õœ(â™…âš•âš›)

Steven, you're killing me here; argument by analogy does not work!

Click to expand...

[ ------ 8< ---------- ]
One of the things that Python is widely known for is its readability.
Allowing symbols such as âˆš to denote identifiers may be quite
expressive and appreciable to the person writing the code. However it
damages readability considerably, as seen in Steven's example above.
Personally I'm not interested in having to maintain another
programmer's code that arbitrarily uses âŒš as a timer function, â•© as
intersection or â–‘ as a matrix constructor.

Click to expand...

I don't find Steven's example convincing. Sure it can be used in a way
that damages readability considerably however lots of things in python
can be abused in a way that damages readability considerably.

That you are not interested in having to maintain someone's code who
would use such symbols is irrelevant. IIRC people have used the exact
same kind of argument against decorators and the if-else operator.

It seems we are all consenting adults until someone doesn't like the
idea how it might influence his job. In that case it shouldn't be
allowed.

That was an exaggeration on my part. It wouldn't affect my job, as I
wouldn't expect to ever actually have to maintain anything like the
above. My greater point though is that it damages Python's
readability for no actual gain in my view. There is nothing useful
you can do with a name that is the U+1F4A9 character that you can't do
just as easily with alphanumeric identifiers like pile_of_poo (or
ÐºÑƒÑ‡Ð°_Ñ„ÐµÐºÐ°Ð»Ð¸Ð¹ if one prefers; that's auto-translated, so don't blame me
if it's a poor translation). The kinds of symbols that we're talking
about here aren't part of any writing systems, and so to incorporate
them in *names* as if they were is an abuse of Unicode.

I don't think the comparisons to decorators and the if-else operator
are apt. First, because while those may degrade readability, they do
so in a constrained way. A decorator application is just the @ symbol
and an identifier. The if-else is just three expressions separated by
keywords. In the case of arbitrary Unicode identifiers, we're talking
about approximately doubling the number of different characters (out
of a continuously growing set) that could be used, many of which are
easily confused with other characters. Of course the potential for
confusion already exists, but that's no justification for aggravating
it.

Second, at least in the case of decorators, while I don't dispute that
they can harm readability, I think that in the majority of cases they
actually help it. That's because the @ syntax placed before a
function or class clearly denotes that the construct is being
decorated by something. The alternative to the syntax is to place an
assignment like "f = decorate(f)" *after* the definition, where it is
much less prominent. That the reader then potentially has to go
figure out what the decorator does is true regardless of whether the @
syntax is used or not. I'm unable to imagine any case where an
arbitrary Unicode identifier would actually improve readability.

Finally, in my experience the "consenting adults" line is usually used
in the context of program or library design. I don't believe it's
appropriate when discussing the design of the language itself, which
should be kept as clean as possible. The logical conclusion of that
would be Lisp-like macros where every user ends up with their own
unique and incompatible version of the language, because we're all
consenting adults here, right?

Python Unicode handling wins again -- mostly	67	Nov 30, 2013
Unicode help please	5	Oct 19, 2013
Benchmarking stripping of Unicode characters which are invalid XML	0	Mar 18, 2012
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Chatbot	0	Oct 8, 2024
byte count unicode string	0	Sep 21, 2006
Demystifying Symbols.	23	Jan 5, 2006
Python's handling of unicode surrogates	17	Apr 20, 2007

unicode as valid naming symbols

Antoon Pardon

Rustom Mody

Mark H Harris

Tim Chase

Rustom Mody

Mark H Harris

Ian Kelly

Chris Angelico

MRAB

Rustom Mody

Rustom Mody

Gregory Ewing

random832

Rustom Mody

Chris Angelico

Dennis Lee Bieber

Chris Angelico

Gregory Ewing

Antoon Pardon

Ian Kelly

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads