Unicode 7

P

Peter Otten

Rustom said:
Just noticed a small thing in which python does a bit better than haskell:
$ ghci
let (ï¬ne, fine) = (1,2)
Prelude> (ï¬ne, fine)
(1,2)
Prelude>

In case its not apparent, the fi in the first fine is a ligature.

Python just barfs:

Not Python 3:

Python 3.3.2+ (default, Feb 28 2014, 00:52:16)
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.(2, 2)

No copy-and-paste errors involved:
2
 
C

Chris Angelico

You think this


is fine?

Not sure which part you're objecting to. Are you saying that this
should be an error:

or that Python should take the exact sequence of codepoints, rather
than normalizing?

Python 3.5.0a0 (default:6a0def54c63d, Mar 26 2014, 01:11:09)
[GCC 4.7.2] on linux
Type "help", "copyright", "credits" or "license" for more information.{'__package__': None, '__spec__': None, '__doc__': None, 'fine': 1,
'__loader__': <class '_frozen_importlib.BuiltinImporter'>,
'__builtins__': <module 'builtins' (built-in)>, '__name__':
'__main__'}

As regards normalization, I would be happy with either "keep it
exactly as you provided" or "normalize according to <insert Unicode
standard normalization here>", as long as it's consistent. It's like
what happens with SQL identifiers: according to the standard, an
unquoted name should be uppercased, but some databases instead
lowercase them. It doesn't break code (modulo quoted names, not
applicable here), as long as it's consistent.

(My reading of PEP 3131 is that NFKC is used; is that what's
implemented, or was that a temporary measure and/or something for Py2
to consider?)

ChrisA
 
N

Ned Batchelder

Rustom Mody wrote:
Not Python 3:
Python 3.3.2+ (default, Feb 28 2014, 00:52:16)
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
(ï¬ne, fine) = (1,2)
(ï¬ne, fine)
(2, 2)
No copy-and-paste errors involved:
2

Aah! Thanks Peter (and Ned and Michael) — 2-3 confusion — my bad.

I am confused about the tone however:
You think this

is fine?

Can you be more explicit? It seems like you think it isn't fine. Why
not? What bothers you about it? Should there be an issue?
 
R

Rustom Mody

Rustom Mody wrote:
Just noticed a small thing in which python does a bit better than haskell:
$ ghci
let (ï¬ne, fine) = (1,2)
Prelude> (ï¬ne, fine)
(1,2)
In case its not apparent, the fi in the first fine is a ligature.
Python just barfs:
Not Python 3:
Python 3.3.2+ (default, Feb 28 2014, 00:52:16)
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
(ï¬ne, fine) = (1,2)
(ï¬ne, fine)
(2, 2)
No copy-and-paste errors involved:
eval("\ufb01ne")
2
eval(b"fine".decode("ascii"))
2
Aah! Thanks Peter (and Ned and Michael) — 2-3 confusion — my bad.
I am confused about the tone however:
You think this
(ï¬ne, fine) = (1,2) # and no issue about it
is fine?
Can you be more explicit? It seems like you think it isn't fine. Why
not? What bothers you about it? Should there be an issue?

Two identifiers that to some programmers
- can look the same
- and not to others
- and that the language treats as different

is not fine (or ï¬ne) to me.

Putting them together as I did is summarizing the problem.

Think of them textually widely separated.
And the code (un)serendipitously 'working' (ie not giving NameErrors)
 
C

Chris Angelico

Two identifiers that to some programmers
- can look the same
- and not to others
- and that the language treats as different

is not fine (or ï¬ne) to me.

The language treats them as the same, though.

ChrisA
 
R

Rustom Mody

The language treats them as the same, though.

Whoops! I seem to be goofing a lot today

Saw Peter's

Didn't notice his next line(2, 2)

So then I am back to my original point:

Python is giving better behavior than Haskell in this regard!

[Earlier reached this conclusion via a wrong path]
 
S

Steven D'Aprano

I am confused about the tone however: You think this


is fine?


It's no worse than any other obfuscated variable name:

MOOSE, MO0SE, M0OSE = 1, 2, 3
xl, x1 = 1, 2

If you know your victim is reading source code in Ariel font, "rn" and
"m" are virtually indistinguishable except at very large sizes.
 
S

Steven D'Aprano

It's no worse than any other obfuscated variable name:

MOOSE, MO0SE, M0OSE = 1, 2, 3
xl, x1 = 1, 2

If you know your victim is reading source code in Ariel font, "rn" and
"m" are virtually indistinguishable except at very large sizes.


Ooops! I too missed that Python normalises the name ï¬ne to fine, so in
fact this is not a case of obfuscation.
 
C

Chris Angelico

If you know your victim is reading source code in Ariel font, "rn" and
"m" are virtually indistinguishable except at very large sizes.

I kinda like the idea of naming it after a bratty teenager who rebels
against her father and runs away from home, but normally the font's
called Arial. :)

ChrisA
 
T

Terry Reedy

(My reading of PEP 3131 is that NFKC is used; is that what's
implemented, or was that a temporary measure and/or something for Py2
to consider?)

The 3.4 docs say "The syntax of identifiers in Python is based on the
Unicode standard annex UAX-31, with elaboration and changes as defined
below; see also PEP 3131 for further details."
....
"All identifiers are converted into the normal form NFKC while parsing;
comparison of identifiers is based on NFKC."

Without reading UAX-31, I don't know how much was changed, but I suspect
not much. In any case, the current rules are intended and very unlikely
to change as that would break code going either forward or back for
little purpose.
 
D

Dennis Lee Bieber

And you've never been bitten by an invisible control character in ASCII
text? You've lived a sheltered life!
Xerox Sigma CP/V would even permit them in file names (though the
system was EBCDIC, not ASCII -- just feeding lots of ASCII terminals).

Think of the pain someone would have trying to figure out where in a 32
character file name the <BEL> was positioned. Even on a 1200bps serial
line, one couldn't really determine between which printable characters the
terminal beeped while listing the directory.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,075
Messages
2,570,562
Members
47,199
Latest member
pinjaman

Latest Threads

Top